Hi Aaron, I greatly appreciate your detailed response. I will go through the notes, code and the examples you provided over the weekend and will keep you posted regarding any issues that I will come across. Once again thank you.
- Rahul On Thu, May 2, 2013 at 7:45 PM, Aaron McCurry <[email protected]> wrote: > Rahul, > > I'm glad you were able to get things built and Blur up and running! Good > questions! Let me see if I can answer them. > > 1. I am not able to find the 'blur.*.hostname' properties in the > blur.properties file, but these are listed in the readme file > > The blur-site.properties file overrides the blur-default.properties file > that can be found in src/blur-util/src/main/resources/ directory. > > 2. There seems to be a lot of code. I greatly appreciate if someone can > give me pointers before I dig through the codebase. Something like an > architectural overview or a flow explaining how the search query is > resolved. > > Good question. I will explain how a query is executing assuming you are > running Blur in a clustered environment (controllers + shards). > > -1. Client creates a query (BlurQuery) with the generated Thrift objects. > -2. Client submits the query to one of the controllers by calling the query > method on the Blur service. > > Note the easiest way to interact with Thrift in the client is by using > BlurClient (org.apache.blur.thrift.BlurClient) in the blur-thrift project. > And you can see it in use here (I just added it, so you might have to > pull) > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-testsuite/src/main/java/org/apache/blur/testsuite/SimpleQueryExample.java;h=df16bd82522bb9083309ac31f849924ba43318be;hb=ff29884ab60262305b99af49fae26575c808b755 > > -3. Once the query arrives in the controller, the controller then > re-submits the query to all the shard servers that are online. > > See > > src/blur-core/src/main/java/org/apache/blur/thrift/BlurControllerServer.java > query method. > > -4. Once the in shard server the query is then parsed into a Lucene query. > -5. The query is executed in parallel, one thread per index shard in the > shard server. > > See src/blur-core/src/main/java/org/apache/blur/thrift/BlurShardServer.java > query method. > > -6. Once the results have been found from the query they are merged and the > top N are returned to the controller. > -7. Once a the results from all the shard servers have returned the top N > are returned to the client. > > I know this is a technical explanation to running a single query, but is > should give you some starting points to dig through the code. > > The projects breakdown: > > blur-core > - This project binds most of the other projects together, houses all the > thrift service impls, failover logic, server startup, shard and controller > management, etc. > blur-gui > - An http status server that runs in each controller and shard server, > needs some work. > blur-mapred > - The bulking indexing code lives in this project. > blur-query > - The lucene query classes that blur implements reside here. > blur-shell > - A basic shell program to interact with blur, needs some more features. > blur-store > - The lucene directory and block cache code resides here. > blur-testsuite > - Current contains a lot of example programs to exercise a blur cluster. > blur-thrift > - Contains generate thrift code and client code, the client code has > automatic retry logic for when you are running multiple controllers, etc. > blur-util > - Contains some basic utility classes, metrics, and zookeeper code. > > > 3. How do you guys manage your development workspace with eclipse, git, and > maven. This will definitely help me get a kickstart. > > I run git on the command line, with mvn and eclipse as my IDE. There are > some shortcuts runs testing a single shard server, or a shard server + > controller server from within eclipse. Take a look at the > org.apache.blur.thrift.ThriftBlurShardServer and > org.apache.blur.thrift.ThriftBlurControllerServer to the main methods that > can be executed to run various processes. If you have ZooKeeper running > you should be able to run those mains and then step through a query being > executed. > > 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the > steps in actually using it. Where do we start? > > Take a look at > > http://www.nearinfinity.com/blogs/aaron_mccurry/an_introduction_to_blur.htmlas > well as the blur-testsuite. That project has some basic programs to > create a table, load data, search, etc. And please follow up with more > questions if you need more guidance or help. > > Thanks for the notes about the initial setup and build! I will take a look > at the errors. > > Aaron > > > On Thu, May 2, 2013 at 1:42 AM, rahul challapalli < > [email protected]> wrote: > > > Hi, > > > > I was able to get blur started (shards and controllers). It worked > straight > > away. Awesome. I have a few more questions. My apologies if some of the > > questions are naive. > > > > 1. I am not able to find the 'blur.*.hostname' properties in the > > blur.properties file, but these are listed in the readme file > > 2. There seems to be a lot of code. I greatly appreciate if someone can > > give me pointers before I dig through the codebase. Something like an > > architectural overview or a flow explaining how the search query is > > resolved. > > 3. How do you guys manage your development workspace with eclipse, git, > and > > maven. This will definitely help me get a kickstart. > > 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the > > steps in actually using it. Where do we start? > > > > Also I am outlining the steps that I followed in getting blur to run and > > also I got a couple of errors during the build process which are also > > listed below. The overall build was successful though. > > > > Apache Blur Single Node Setup on Mac OS X Lion > > > > 1. Environment : Single Node Hadoop-1.0.4 and Zookeeper-3.4.5 > > 2. Get the Blur code from Git using git clone > > https://git-wip-us.apache.org/repos/asf/incubator-blur.git > > 3. Checkout the branch 0.1.5 > > 4. Run 'mvn clean install' from the 'src' directory as superuser > > 5. Extract the Blur tar.gz file from the 'target/' directory into a > > convenient location and set BLUR_HOME to this location and add it to > > .bash_profile > > 6. Go to the extracted folder and configure the > > $BLUR_HOME/config/blur-env.sh file. The two exports that are required: > > export JAVA_HOME=$(/usr/libexec/java_home) > > export HADOOP_HOME=/usr/local/hadoop > > 7. Setup the $BLUR_HOME/config/blur.properties file. The default site > > configuration: > > blur.zookeeper.connection=localhost > > blur.cluster.name=default > > 8. Start blur using $BLUR_HOME/bin/start-all.sh > > > > Errors during the build process : > > > > ERROR 20130430_22:47:42:042_PDT [IndexReader-Refresher] > > writer.BlurIndexRefresher: Unknown error > > org.apache.lucene.store.AlreadyClosedException: this Directory is closed > > at org.apache.lucene.store.Directory.ensureOpen(Directory.java:256) > > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) > > at > > > > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679) > > at > > > > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) > > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) > > at > > > > > org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326) > > at > > > > > org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:284) > > at > > > > > org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:247) > > at > > > > > org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235) > > at > > > > > org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169) > > at > > > > > org.apache.blur.manager.writer.BlurIndexReader.refresh(BlurIndexReader.java:82) > > at > > > > > org.apache.blur.manager.writer.BlurIndexRefresher.refreshInternal(BlurIndexRefresher.java:70) > > at > > > > > org.apache.blur.manager.writer.BlurIndexRefresher.run(BlurIndexRefresher.java:61) > > at java.util.TimerThread.mainLoop(Timer.java:512) > > at java.util.TimerThread.run(Timer.java:462) > > WARN 20130501_20:54:18:018_PDT [main] jmx.MBeanRegistry: Error during > > unregister > > javax.management.InstanceNotFoundException: > > > > > org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree > > at > > > > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) > > at > > > > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:415) > > at > > > > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:403) > > at > > > > > com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:507) > > at > > org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:115) > > at > > org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:132) > > at > > > > > org.apache.zookeeper.server.ZooKeeperServer.unregisterJMX(ZooKeeperServer.java:443) > > at > > > > > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:436) > > at > > > > > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:271) > > at > > > > > org.apache.zookeeper.server.ZooKeeperServerMain.shutdown(ZooKeeperServerMain.java:127) > > at > > > > > org.apache.blur.MiniCluster$ZooKeeperServerMainEmbedded.shutdown(MiniCluster.java:339) > > at org.apache.blur.MiniCluster.shutdownZooKeeper(MiniCluster.java:427) > > at org.apache.blur.MiniCluster.shutdownBlurCluster(MiniCluster.java:146) > > at > > > > > org.apache.blur.thrift.BlurClusterTest.shutdownCluster(BlurClusterTest.java:81) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > > > > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > > at > > > > > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > > at > > > > > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > > at > > > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37) > > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > > at > > > > > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236) > > at > > > > > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134) > > at > > > > > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > > > > org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) > > at > > > > > org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) > > at > > > > > org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) > > at > > > > > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103) > > at > > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74) > > > > > > - Rahul > > > > > > > > > > On Tue, Apr 30, 2013 at 9:29 PM, rahul challapalli < > > [email protected]> wrote: > > > > > Aaron, > > > > > > Thanks for your reply. I will sure let you know how it goes. > > > > > > - Rahul > > > > > > > > > On Tue, Apr 30, 2013 at 7:33 PM, Aaron McCurry <[email protected]> > > wrote: > > > > > >> Hi Rahul, > > >> > > >> Welcome! Blur is a young incubator project and with that there is > not a > > >> lot of documentation. Yet. But we do have a lot of code. :-) > > >> > > >> Blur uses HDFS for storing indexes, MapReduce for bulk indexing, > Thrift > > >> for > > >> RPC and ZooKeeper for state, and of course Lucene for search. Yes > Blur > > >> can > > >> and should run along side a standard Hadoop install (MapReduce + > HDFS). > > >> It > > >> currently works with the 1.0.x version or CDH3 from Cloudera. I'm > sure > > we > > >> can get it to work with 2.0.x and CDH4, it just hasn't happen yet. > > >> However > > >> the only dependency to run Blur on a single machine is ZooKeeper. > HDFS > > is > > >> required for a cluster. > > >> > > >> To get you started. > > >> > > >> git clone https://git-wip-us.apache.org/repos/asf/incubator-blur.git > > >> > > >> # we are currently focusing on getting 0.1.5 to a releasable state. > > >> git checkout 0.1.5 > > >> > > >> In the checkout you will find a README.md that is a bit out of date > with > > >> the code examples but the general theme is correct. For more examples > > >> take > > >> a look at the blur-testsuite project, there are a lot of code examples > > in > > >> there to get you started. > > >> > > >> To build the project into a tarball that can be extracted and > executed. > > >> > > >> run "mvn install" from the src/ directory. Once it has successfully > > >> executed all the tests and built everything you will find a tar.gz > file > > in > > >> the target/ directory in the distribution project. > > >> > > >> Before you can run Blur, Apache ZooKeeper needs to be running. A > > default > > >> install will work. > > >> > > >> After extracting the Blur tar.gz file you should be able to run the > > >> bin/start-all.sh and it should start a Blur controller and a shard > > server > > >> on your local machine. > > >> > > >> I would love to hear how your initial compile and install goes, > because > > we > > >> could use this thread and any information that is exchanged to create > a > > >> nice little wiki page for 0.1.5. > > >> > > >> Thank! > > >> > > >> Aaron > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> On Tue, Apr 30, 2013 at 2:17 PM, rahul challapalli < > > >> [email protected]> wrote: > > >> > > >> > Hi, > > >> > > > >> > I am new to blur and even ASF in terms of contributing back to a > > >> project. I > > >> > have decent knowledge about hadoop and mapreduce but completely new > to > > >> > search. I come from a Java/PHP background. I am looking for some > > >> direction > > >> > in setting up blur on my local machine. I have a single node hadoop > > >> > installation on my Mac OS X Lion. Is it an issue if I have HDFS, > > >> MapReduce > > >> > daemons running alongside blur on the same machine. I would greatly > > >> > appreciate if you can refer me to some setup document as well as an > > >> insight > > >> > into the architecture of blur. Thank You. > > >> > > > >> > - Rahul > > >> > > > >> > > > > > > > > >
