That's good to hear! Thanks for letting us know. Aaron
On Tue, May 7, 2013 at 2:32 AM, rahul challapalli < [email protected]> wrote: > Hi, > > I was able to run the code examples for loading data and searching. There > were really no hiccups. I started digging through the code and will let you > know if I have any questions. Thanks > > - Rahul > > > On Thu, May 2, 2013 at 11:33 PM, rahul challapalli < > [email protected]> wrote: > > > Hi Aaron, > > > > I greatly appreciate your detailed response. I will go through the notes, > > code and the examples you provided over the weekend and will keep you > > posted regarding any issues that I will come across. Once again thank > you. > > > > - Rahul > > > > > > On Thu, May 2, 2013 at 7:45 PM, Aaron McCurry <[email protected]> > wrote: > > > >> Rahul, > >> > >> I'm glad you were able to get things built and Blur up and running! > Good > >> questions! Let me see if I can answer them. > >> > >> 1. I am not able to find the 'blur.*.hostname' properties in the > >> blur.properties file, but these are listed in the readme file > >> > >> The blur-site.properties file overrides the blur-default.properties file > >> that can be found in src/blur-util/src/main/resources/ directory. > >> > >> 2. There seems to be a lot of code. I greatly appreciate if someone can > >> give me pointers before I dig through the codebase. Something like an > >> architectural overview or a flow explaining how the search query is > >> resolved. > >> > >> Good question. I will explain how a query is executing assuming you are > >> running Blur in a clustered environment (controllers + shards). > >> > >> -1. Client creates a query (BlurQuery) with the generated Thrift > objects. > >> -2. Client submits the query to one of the controllers by calling the > >> query > >> method on the Blur service. > >> > >> Note the easiest way to interact with Thrift in the client is by using > >> BlurClient (org.apache.blur.thrift.BlurClient) in the blur-thrift > project. > >> And you can see it in use here (I just added it, so you might have to > >> pull) > >> > >> > >> > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-testsuite/src/main/java/org/apache/blur/testsuite/SimpleQueryExample.java;h=df16bd82522bb9083309ac31f849924ba43318be;hb=ff29884ab60262305b99af49fae26575c808b755 > >> > >> -3. Once the query arrives in the controller, the controller then > >> re-submits the query to all the shard servers that are online. > >> > >> See > >> > >> > src/blur-core/src/main/java/org/apache/blur/thrift/BlurControllerServer.java > >> query method. > >> > >> -4. Once the in shard server the query is then parsed into a Lucene > query. > >> -5. The query is executed in parallel, one thread per index shard in the > >> shard server. > >> > >> See > >> src/blur-core/src/main/java/org/apache/blur/thrift/BlurShardServer.java > >> query method. > >> > >> -6. Once the results have been found from the query they are merged and > >> the > >> top N are returned to the controller. > >> -7. Once a the results from all the shard servers have returned the top > N > >> are returned to the client. > >> > >> I know this is a technical explanation to running a single query, but is > >> should give you some starting points to dig through the code. > >> > >> The projects breakdown: > >> > >> blur-core > >> - This project binds most of the other projects together, houses all the > >> thrift service impls, failover logic, server startup, shard and > controller > >> management, etc. > >> blur-gui > >> - An http status server that runs in each controller and shard server, > >> needs some work. > >> blur-mapred > >> - The bulking indexing code lives in this project. > >> blur-query > >> - The lucene query classes that blur implements reside here. > >> blur-shell > >> - A basic shell program to interact with blur, needs some more features. > >> blur-store > >> - The lucene directory and block cache code resides here. > >> blur-testsuite > >> - Current contains a lot of example programs to exercise a blur cluster. > >> blur-thrift > >> - Contains generate thrift code and client code, the client code has > >> automatic retry logic for when you are running multiple controllers, > etc. > >> blur-util > >> - Contains some basic utility classes, metrics, and zookeeper code. > >> > >> > >> 3. How do you guys manage your development workspace with eclipse, git, > >> and > >> maven. This will definitely help me get a kickstart. > >> > >> I run git on the command line, with mvn and eclipse as my IDE. There > are > >> some shortcuts runs testing a single shard server, or a shard server + > >> controller server from within eclipse. Take a look at the > >> org.apache.blur.thrift.ThriftBlurShardServer and > >> org.apache.blur.thrift.ThriftBlurControllerServer to the main methods > that > >> can be executed to run various processes. If you have ZooKeeper running > >> you should be able to run those mains and then step through a query > being > >> executed. > >> > >> 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the > >> steps in actually using it. Where do we start? > >> > >> Take a look at > >> > >> > http://www.nearinfinity.com/blogs/aaron_mccurry/an_introduction_to_blur.htmlas > >> well as the blur-testsuite. That project has some basic programs to > >> create a table, load data, search, etc. And please follow up with more > >> questions if you need more guidance or help. > >> > >> Thanks for the notes about the initial setup and build! I will take a > >> look > >> at the errors. > >> > >> Aaron > >> > >> > >> On Thu, May 2, 2013 at 1:42 AM, rahul challapalli < > >> [email protected]> wrote: > >> > >> > Hi, > >> > > >> > I was able to get blur started (shards and controllers). It worked > >> straight > >> > away. Awesome. I have a few more questions. My apologies if some of > the > >> > questions are naive. > >> > > >> > 1. I am not able to find the 'blur.*.hostname' properties in the > >> > blur.properties file, but these are listed in the readme file > >> > 2. There seems to be a lot of code. I greatly appreciate if someone > can > >> > give me pointers before I dig through the codebase. Something like an > >> > architectural overview or a flow explaining how the search query is > >> > resolved. > >> > 3. How do you guys manage your development workspace with eclipse, > git, > >> and > >> > maven. This will definitely help me get a kickstart. > >> > 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are > the > >> > steps in actually using it. Where do we start? > >> > > >> > Also I am outlining the steps that I followed in getting blur to run > and > >> > also I got a couple of errors during the build process which are also > >> > listed below. The overall build was successful though. > >> > > >> > Apache Blur Single Node Setup on Mac OS X Lion > >> > > >> > 1. Environment : Single Node Hadoop-1.0.4 and Zookeeper-3.4.5 > >> > 2. Get the Blur code from Git using git clone > >> > https://git-wip-us.apache.org/repos/asf/incubator-blur.git > >> > 3. Checkout the branch 0.1.5 > >> > 4. Run 'mvn clean install' from the 'src' directory as superuser > >> > 5. Extract the Blur tar.gz file from the 'target/' directory into a > >> > convenient location and set BLUR_HOME to this location and add it to > >> > .bash_profile > >> > 6. Go to the extracted folder and configure the > >> > $BLUR_HOME/config/blur-env.sh file. The two exports that are > required: > >> > export JAVA_HOME=$(/usr/libexec/java_home) > >> > export HADOOP_HOME=/usr/local/hadoop > >> > 7. Setup the $BLUR_HOME/config/blur.properties file. The default site > >> > configuration: > >> > blur.zookeeper.connection=localhost > >> > blur.cluster.name=default > >> > 8. Start blur using $BLUR_HOME/bin/start-all.sh > >> > > >> > Errors during the build process : > >> > > >> > ERROR 20130430_22:47:42:042_PDT [IndexReader-Refresher] > >> > writer.BlurIndexRefresher: Unknown error > >> > org.apache.lucene.store.AlreadyClosedException: this Directory is > closed > >> > at org.apache.lucene.store.Directory.ensureOpen(Directory.java:256) > >> > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) > >> > at > >> > > >> > > >> > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679) > >> > at > >> > > >> > > >> > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) > >> > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) > >> > at > >> > > >> > > >> > org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326) > >> > at > >> > > >> > > >> > org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:284) > >> > at > >> > > >> > > >> > org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:247) > >> > at > >> > > >> > > >> > org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235) > >> > at > >> > > >> > > >> > org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169) > >> > at > >> > > >> > > >> > org.apache.blur.manager.writer.BlurIndexReader.refresh(BlurIndexReader.java:82) > >> > at > >> > > >> > > >> > org.apache.blur.manager.writer.BlurIndexRefresher.refreshInternal(BlurIndexRefresher.java:70) > >> > at > >> > > >> > > >> > org.apache.blur.manager.writer.BlurIndexRefresher.run(BlurIndexRefresher.java:61) > >> > at java.util.TimerThread.mainLoop(Timer.java:512) > >> > at java.util.TimerThread.run(Timer.java:462) > >> > WARN 20130501_20:54:18:018_PDT [main] jmx.MBeanRegistry: Error > during > >> > unregister > >> > javax.management.InstanceNotFoundException: > >> > > >> > > >> > org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree > >> > at > >> > > >> > > >> > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) > >> > at > >> > > >> > > >> > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:415) > >> > at > >> > > >> > > >> > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:403) > >> > at > >> > > >> > > >> > com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:507) > >> > at > >> > > >> > org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:115) > >> > at > >> > > >> > org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:132) > >> > at > >> > > >> > > >> > org.apache.zookeeper.server.ZooKeeperServer.unregisterJMX(ZooKeeperServer.java:443) > >> > at > >> > > >> > > >> > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:436) > >> > at > >> > > >> > > >> > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:271) > >> > at > >> > > >> > > >> > org.apache.zookeeper.server.ZooKeeperServerMain.shutdown(ZooKeeperServerMain.java:127) > >> > at > >> > > >> > > >> > org.apache.blur.MiniCluster$ZooKeeperServerMainEmbedded.shutdown(MiniCluster.java:339) > >> > at org.apache.blur.MiniCluster.shutdownZooKeeper(MiniCluster.java:427) > >> > at > >> org.apache.blur.MiniCluster.shutdownBlurCluster(MiniCluster.java:146) > >> > at > >> > > >> > > >> > org.apache.blur.thrift.BlurClusterTest.shutdownCluster(BlurClusterTest.java:81) > >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> > at > >> > > >> > > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >> > at > >> > > >> > > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> > at java.lang.reflect.Method.invoke(Method.java:597) > >> > at > >> > > >> > > >> > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > >> > at > >> > > >> > > >> > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > >> > at > >> > > >> > > >> > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > >> > at > >> > > >> > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37) > >> > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > >> > at > >> > > >> > > >> > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236) > >> > at > >> > > >> > > >> > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134) > >> > at > >> > > >> > > >> > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113) > >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> > at > >> > > >> > > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >> > at > >> > > >> > > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> > at java.lang.reflect.Method.invoke(Method.java:597) > >> > at > >> > > >> > > >> > org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) > >> > at > >> > > >> > > >> > org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) > >> > at > >> > > >> > > >> > org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) > >> > at > >> > > >> > > >> > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103) > >> > at > >> > > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74) > >> > > >> > > >> > - Rahul > >> > > >> > > >> > > >> > > >> > On Tue, Apr 30, 2013 at 9:29 PM, rahul challapalli < > >> > [email protected]> wrote: > >> > > >> > > Aaron, > >> > > > >> > > Thanks for your reply. I will sure let you know how it goes. > >> > > > >> > > - Rahul > >> > > > >> > > > >> > > On Tue, Apr 30, 2013 at 7:33 PM, Aaron McCurry <[email protected]> > >> > wrote: > >> > > > >> > >> Hi Rahul, > >> > >> > >> > >> Welcome! Blur is a young incubator project and with that there is > >> not a > >> > >> lot of documentation. Yet. But we do have a lot of code. :-) > >> > >> > >> > >> Blur uses HDFS for storing indexes, MapReduce for bulk indexing, > >> Thrift > >> > >> for > >> > >> RPC and ZooKeeper for state, and of course Lucene for search. Yes > >> Blur > >> > >> can > >> > >> and should run along side a standard Hadoop install (MapReduce + > >> HDFS). > >> > >> It > >> > >> currently works with the 1.0.x version or CDH3 from Cloudera. I'm > >> sure > >> > we > >> > >> can get it to work with 2.0.x and CDH4, it just hasn't happen yet. > >> > >> However > >> > >> the only dependency to run Blur on a single machine is ZooKeeper. > >> HDFS > >> > is > >> > >> required for a cluster. > >> > >> > >> > >> To get you started. > >> > >> > >> > >> git clone > https://git-wip-us.apache.org/repos/asf/incubator-blur.git > >> > >> > >> > >> # we are currently focusing on getting 0.1.5 to a releasable state. > >> > >> git checkout 0.1.5 > >> > >> > >> > >> In the checkout you will find a README.md that is a bit out of date > >> with > >> > >> the code examples but the general theme is correct. For more > >> examples > >> > >> take > >> > >> a look at the blur-testsuite project, there are a lot of code > >> examples > >> > in > >> > >> there to get you started. > >> > >> > >> > >> To build the project into a tarball that can be extracted and > >> executed. > >> > >> > >> > >> run "mvn install" from the src/ directory. Once it has > successfully > >> > >> executed all the tests and built everything you will find a tar.gz > >> file > >> > in > >> > >> the target/ directory in the distribution project. > >> > >> > >> > >> Before you can run Blur, Apache ZooKeeper needs to be running. A > >> > default > >> > >> install will work. > >> > >> > >> > >> After extracting the Blur tar.gz file you should be able to run the > >> > >> bin/start-all.sh and it should start a Blur controller and a shard > >> > server > >> > >> on your local machine. > >> > >> > >> > >> I would love to hear how your initial compile and install goes, > >> because > >> > we > >> > >> could use this thread and any information that is exchanged to > >> create a > >> > >> nice little wiki page for 0.1.5. > >> > >> > >> > >> Thank! > >> > >> > >> > >> Aaron > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Tue, Apr 30, 2013 at 2:17 PM, rahul challapalli < > >> > >> [email protected]> wrote: > >> > >> > >> > >> > Hi, > >> > >> > > >> > >> > I am new to blur and even ASF in terms of contributing back to a > >> > >> project. I > >> > >> > have decent knowledge about hadoop and mapreduce but completely > >> new to > >> > >> > search. I come from a Java/PHP background. I am looking for some > >> > >> direction > >> > >> > in setting up blur on my local machine. I have a single node > hadoop > >> > >> > installation on my Mac OS X Lion. Is it an issue if I have HDFS, > >> > >> MapReduce > >> > >> > daemons running alongside blur on the same machine. I would > greatly > >> > >> > appreciate if you can refer me to some setup document as well as > an > >> > >> insight > >> > >> > into the architecture of blur. Thank You. > >> > >> > > >> > >> > - Rahul > >> > >> > > >> > >> > >> > > > >> > > > >> > > >> > > > > >
