Rahul, I'm glad you were able to get things built and Blur up and running! Good questions! Let me see if I can answer them.
1. I am not able to find the 'blur.*.hostname' properties in the blur.properties file, but these are listed in the readme file The blur-site.properties file overrides the blur-default.properties file that can be found in src/blur-util/src/main/resources/ directory. 2. There seems to be a lot of code. I greatly appreciate if someone can give me pointers before I dig through the codebase. Something like an architectural overview or a flow explaining how the search query is resolved. Good question. I will explain how a query is executing assuming you are running Blur in a clustered environment (controllers + shards). -1. Client creates a query (BlurQuery) with the generated Thrift objects. -2. Client submits the query to one of the controllers by calling the query method on the Blur service. Note the easiest way to interact with Thrift in the client is by using BlurClient (org.apache.blur.thrift.BlurClient) in the blur-thrift project. And you can see it in use here (I just added it, so you might have to pull) https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-testsuite/src/main/java/org/apache/blur/testsuite/SimpleQueryExample.java;h=df16bd82522bb9083309ac31f849924ba43318be;hb=ff29884ab60262305b99af49fae26575c808b755 -3. Once the query arrives in the controller, the controller then re-submits the query to all the shard servers that are online. See src/blur-core/src/main/java/org/apache/blur/thrift/BlurControllerServer.java query method. -4. Once the in shard server the query is then parsed into a Lucene query. -5. The query is executed in parallel, one thread per index shard in the shard server. See src/blur-core/src/main/java/org/apache/blur/thrift/BlurShardServer.java query method. -6. Once the results have been found from the query they are merged and the top N are returned to the controller. -7. Once a the results from all the shard servers have returned the top N are returned to the client. I know this is a technical explanation to running a single query, but is should give you some starting points to dig through the code. The projects breakdown: blur-core - This project binds most of the other projects together, houses all the thrift service impls, failover logic, server startup, shard and controller management, etc. blur-gui - An http status server that runs in each controller and shard server, needs some work. blur-mapred - The bulking indexing code lives in this project. blur-query - The lucene query classes that blur implements reside here. blur-shell - A basic shell program to interact with blur, needs some more features. blur-store - The lucene directory and block cache code resides here. blur-testsuite - Current contains a lot of example programs to exercise a blur cluster. blur-thrift - Contains generate thrift code and client code, the client code has automatic retry logic for when you are running multiple controllers, etc. blur-util - Contains some basic utility classes, metrics, and zookeeper code. 3. How do you guys manage your development workspace with eclipse, git, and maven. This will definitely help me get a kickstart. I run git on the command line, with mvn and eclipse as my IDE. There are some shortcuts runs testing a single shard server, or a shard server + controller server from within eclipse. Take a look at the org.apache.blur.thrift.ThriftBlurShardServer and org.apache.blur.thrift.ThriftBlurControllerServer to the main methods that can be executed to run various processes. If you have ZooKeeper running you should be able to run those mains and then step through a query being executed. 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the steps in actually using it. Where do we start? Take a look at http://www.nearinfinity.com/blogs/aaron_mccurry/an_introduction_to_blur.htmlas well as the blur-testsuite. That project has some basic programs to create a table, load data, search, etc. And please follow up with more questions if you need more guidance or help. Thanks for the notes about the initial setup and build! I will take a look at the errors. Aaron On Thu, May 2, 2013 at 1:42 AM, rahul challapalli < [email protected]> wrote: > Hi, > > I was able to get blur started (shards and controllers). It worked straight > away. Awesome. I have a few more questions. My apologies if some of the > questions are naive. > > 1. I am not able to find the 'blur.*.hostname' properties in the > blur.properties file, but these are listed in the readme file > 2. There seems to be a lot of code. I greatly appreciate if someone can > give me pointers before I dig through the codebase. Something like an > architectural overview or a flow explaining how the search query is > resolved. > 3. How do you guys manage your development workspace with eclipse, git, and > maven. This will definitely help me get a kickstart. > 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the > steps in actually using it. Where do we start? > > Also I am outlining the steps that I followed in getting blur to run and > also I got a couple of errors during the build process which are also > listed below. The overall build was successful though. > > Apache Blur Single Node Setup on Mac OS X Lion > > 1. Environment : Single Node Hadoop-1.0.4 and Zookeeper-3.4.5 > 2. Get the Blur code from Git using git clone > https://git-wip-us.apache.org/repos/asf/incubator-blur.git > 3. Checkout the branch 0.1.5 > 4. Run 'mvn clean install' from the 'src' directory as superuser > 5. Extract the Blur tar.gz file from the 'target/' directory into a > convenient location and set BLUR_HOME to this location and add it to > .bash_profile > 6. Go to the extracted folder and configure the > $BLUR_HOME/config/blur-env.sh file. The two exports that are required: > export JAVA_HOME=$(/usr/libexec/java_home) > export HADOOP_HOME=/usr/local/hadoop > 7. Setup the $BLUR_HOME/config/blur.properties file. The default site > configuration: > blur.zookeeper.connection=localhost > blur.cluster.name=default > 8. Start blur using $BLUR_HOME/bin/start-all.sh > > Errors during the build process : > > ERROR 20130430_22:47:42:042_PDT [IndexReader-Refresher] > writer.BlurIndexRefresher: Unknown error > org.apache.lucene.store.AlreadyClosedException: this Directory is closed > at org.apache.lucene.store.Directory.ensureOpen(Directory.java:256) > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) > at > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679) > at > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) > at > > org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326) > at > > org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:284) > at > > org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:247) > at > > org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235) > at > > org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169) > at > > org.apache.blur.manager.writer.BlurIndexReader.refresh(BlurIndexReader.java:82) > at > > org.apache.blur.manager.writer.BlurIndexRefresher.refreshInternal(BlurIndexRefresher.java:70) > at > > org.apache.blur.manager.writer.BlurIndexRefresher.run(BlurIndexRefresher.java:61) > at java.util.TimerThread.mainLoop(Timer.java:512) > at java.util.TimerThread.run(Timer.java:462) > WARN 20130501_20:54:18:018_PDT [main] jmx.MBeanRegistry: Error during > unregister > javax.management.InstanceNotFoundException: > > org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree > at > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) > at > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:415) > at > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:403) > at > > com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:507) > at > org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:115) > at > org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:132) > at > > org.apache.zookeeper.server.ZooKeeperServer.unregisterJMX(ZooKeeperServer.java:443) > at > > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:436) > at > > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:271) > at > > org.apache.zookeeper.server.ZooKeeperServerMain.shutdown(ZooKeeperServerMain.java:127) > at > > org.apache.blur.MiniCluster$ZooKeeperServerMainEmbedded.shutdown(MiniCluster.java:339) > at org.apache.blur.MiniCluster.shutdownZooKeeper(MiniCluster.java:427) > at org.apache.blur.MiniCluster.shutdownBlurCluster(MiniCluster.java:146) > at > > org.apache.blur.thrift.BlurClusterTest.shutdownCluster(BlurClusterTest.java:81) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37) > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > at > > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236) > at > > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134) > at > > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > > org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) > at > > org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) > at > > org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) > at > > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74) > > > - Rahul > > > > > On Tue, Apr 30, 2013 at 9:29 PM, rahul challapalli < > [email protected]> wrote: > > > Aaron, > > > > Thanks for your reply. I will sure let you know how it goes. > > > > - Rahul > > > > > > On Tue, Apr 30, 2013 at 7:33 PM, Aaron McCurry <[email protected]> > wrote: > > > >> Hi Rahul, > >> > >> Welcome! Blur is a young incubator project and with that there is not a > >> lot of documentation. Yet. But we do have a lot of code. :-) > >> > >> Blur uses HDFS for storing indexes, MapReduce for bulk indexing, Thrift > >> for > >> RPC and ZooKeeper for state, and of course Lucene for search. Yes Blur > >> can > >> and should run along side a standard Hadoop install (MapReduce + HDFS). > >> It > >> currently works with the 1.0.x version or CDH3 from Cloudera. I'm sure > we > >> can get it to work with 2.0.x and CDH4, it just hasn't happen yet. > >> However > >> the only dependency to run Blur on a single machine is ZooKeeper. HDFS > is > >> required for a cluster. > >> > >> To get you started. > >> > >> git clone https://git-wip-us.apache.org/repos/asf/incubator-blur.git > >> > >> # we are currently focusing on getting 0.1.5 to a releasable state. > >> git checkout 0.1.5 > >> > >> In the checkout you will find a README.md that is a bit out of date with > >> the code examples but the general theme is correct. For more examples > >> take > >> a look at the blur-testsuite project, there are a lot of code examples > in > >> there to get you started. > >> > >> To build the project into a tarball that can be extracted and executed. > >> > >> run "mvn install" from the src/ directory. Once it has successfully > >> executed all the tests and built everything you will find a tar.gz file > in > >> the target/ directory in the distribution project. > >> > >> Before you can run Blur, Apache ZooKeeper needs to be running. A > default > >> install will work. > >> > >> After extracting the Blur tar.gz file you should be able to run the > >> bin/start-all.sh and it should start a Blur controller and a shard > server > >> on your local machine. > >> > >> I would love to hear how your initial compile and install goes, because > we > >> could use this thread and any information that is exchanged to create a > >> nice little wiki page for 0.1.5. > >> > >> Thank! > >> > >> Aaron > >> > >> > >> > >> > >> > >> > >> > >> On Tue, Apr 30, 2013 at 2:17 PM, rahul challapalli < > >> [email protected]> wrote: > >> > >> > Hi, > >> > > >> > I am new to blur and even ASF in terms of contributing back to a > >> project. I > >> > have decent knowledge about hadoop and mapreduce but completely new to > >> > search. I come from a Java/PHP background. I am looking for some > >> direction > >> > in setting up blur on my local machine. I have a single node hadoop > >> > installation on my Mac OS X Lion. Is it an issue if I have HDFS, > >> MapReduce > >> > daemons running alongside blur on the same machine. I would greatly > >> > appreciate if you can refer me to some setup document as well as an > >> insight > >> > into the architecture of blur. Thank You. > >> > > >> > - Rahul > >> > > >> > > > > >
