Re: New to Blur

Aaron McCurry Thu, 02 May 2013 19:46:07 -0700

Rahul,

I'm glad you were able to get things built and Blur up and running!  Good
questions!  Let me see if I can answer them.


1. I am not able to find the 'blur.*.hostname' properties in the
blur.properties file, but these are listed in the readme file

The blur-site.properties file overrides the blur-default.properties file
that can be found in src/blur-util/src/main/resources/ directory.

2. There seems to be a lot of code. I greatly appreciate if someone can
give me pointers before I dig through the codebase. Something like an
architectural overview or a flow explaining how the search query is
resolved.

Good question.  I will explain how a query is executing assuming you are
running Blur in a clustered environment (controllers + shards).

-1. Client creates a query (BlurQuery) with the generated Thrift objects.
-2. Client submits the query to one of the controllers by calling the query
method on the Blur service.

Note the easiest way to interact with Thrift in the client is by using
BlurClient (org.apache.blur.thrift.BlurClient) in the blur-thrift project.
 And you can see it in use here (I just added it, so you might have to pull)

https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-testsuite/src/main/java/org/apache/blur/testsuite/SimpleQueryExample.java;h=df16bd82522bb9083309ac31f849924ba43318be;hb=ff29884ab60262305b99af49fae26575c808b755

-3. Once the query arrives in the controller, the controller then
re-submits the query to all the shard servers that are online.

See
src/blur-core/src/main/java/org/apache/blur/thrift/BlurControllerServer.java
query method.

-4. Once the in shard server the query is then parsed into a Lucene query.
-5. The query is executed in parallel, one thread per index shard in the
shard server.

See src/blur-core/src/main/java/org/apache/blur/thrift/BlurShardServer.java
query method.

-6. Once the results have been found from the query they are merged and the
top N are returned to the controller.
-7. Once a the results from all the shard servers have returned the top N
are returned to the client.

I know this is a technical explanation to running a single query, but is
should give you some starting points to dig through the code.

The projects breakdown:

blur-core
- This project binds most of the other projects together, houses all the
thrift service impls, failover logic, server startup, shard and controller
management, etc.
blur-gui
- An http status server that runs in each controller and shard server,
needs some work.
blur-mapred
- The bulking indexing code lives in this project.
blur-query
- The lucene query classes that blur implements reside here.
blur-shell
- A basic shell program to interact with blur, needs some more features.
blur-store
- The lucene directory and block cache code resides here.
blur-testsuite
- Current contains a lot of example programs to exercise a blur cluster.
blur-thrift
- Contains generate thrift code and client code, the client code has
automatic retry logic for when you are running multiple controllers, etc.
blur-util
- Contains some basic utility classes, metrics, and zookeeper code.


3. How do you guys manage your development workspace with eclipse, git, and
maven. This will definitely help me get a kickstart.

I run git on the command line, with mvn and eclipse as my IDE.  There are
some shortcuts runs testing a single shard server, or a shard server +
controller server from within eclipse.  Take a look at the
org.apache.blur.thrift.ThriftBlurShardServer and
org.apache.blur.thrift.ThriftBlurControllerServer to the main methods that
can be executed to run various processes.  If you have ZooKeeper running
you should be able to run those mains and then step through a query being
executed.

4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the
steps in actually using it. Where do we start?

Take a look at
http://www.nearinfinity.com/blogs/aaron_mccurry/an_introduction_to_blur.htmlas
well as the blur-testsuite.  That project has some basic programs to
create a table, load data, search, etc.  And please follow up with more
questions if you need more guidance or help.

Thanks for the notes about the initial setup and build!  I will take a look
at the errors.

Aaron


On Thu, May 2, 2013 at 1:42 AM, rahul challapalli <
[email protected]> wrote:

> Hi,
>
> I was able to get blur started (shards and controllers). It worked straight
> away. Awesome. I have a few more questions. My apologies if some of the
> questions are naive.
>
> 1. I am not able to find the 'blur.*.hostname' properties in the
> blur.properties file, but these are listed in the readme file
> 2. There seems to be a lot of code. I greatly appreciate if someone can
> give me pointers before I dig through the codebase. Something like an
> architectural overview or a flow explaining how the search query is
> resolved.
> 3. How do you guys manage your development workspace with eclipse, git, and
> maven. This will definitely help me get a kickstart.
> 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the
> steps in actually using it. Where do we start?
>
> Also I am outlining the steps that I followed in getting blur to run and
> also I got a couple of errors during the build process which are also
> listed below. The overall build was successful though.
>
> Apache Blur Single Node Setup on Mac OS X Lion
>
> 1. Environment : Single Node Hadoop-1.0.4 and Zookeeper-3.4.5
> 2. Get the Blur code from Git using git clone
> https://git-wip-us.apache.org/repos/asf/incubator-blur.git
> 3. Checkout the branch 0.1.5
> 4. Run 'mvn clean install' from the 'src' directory as superuser
> 5. Extract the Blur tar.gz file from the 'target/' directory into a
> convenient location and set BLUR_HOME to this location and add it to
> .bash_profile
> 6. Go to the extracted folder and configure the
> $BLUR_HOME/config/blur-env.sh file.  The two exports that are required:
>            export JAVA_HOME=$(/usr/libexec/java_home)
>            export HADOOP_HOME=/usr/local/hadoop
> 7. Setup the $BLUR_HOME/config/blur.properties file.  The default site
> configuration:
>            blur.zookeeper.connection=localhost
>            blur.cluster.name=default
> 8. Start blur using $BLUR_HOME/bin/start-all.sh
>
> Errors during the build process :
>
>  ERROR 20130430_22:47:42:042_PDT [IndexReader-Refresher]
> writer.BlurIndexRefresher: Unknown error
> org.apache.lucene.store.AlreadyClosedException: this Directory is closed
> at org.apache.lucene.store.Directory.ensureOpen(Directory.java:256)
> at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
>  at
>
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679)
> at
>
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
>  at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
> at
>
> org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326)
>  at
>
> org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:284)
> at
>
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:247)
>  at
>
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235)
> at
>
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169)
>  at
>
> org.apache.blur.manager.writer.BlurIndexReader.refresh(BlurIndexReader.java:82)
> at
>
> org.apache.blur.manager.writer.BlurIndexRefresher.refreshInternal(BlurIndexRefresher.java:70)
>  at
>
> org.apache.blur.manager.writer.BlurIndexRefresher.run(BlurIndexRefresher.java:61)
> at java.util.TimerThread.mainLoop(Timer.java:512)
>  at java.util.TimerThread.run(Timer.java:462)
>   WARN  20130501_20:54:18:018_PDT [main] jmx.MBeanRegistry: Error during
> unregister
> javax.management.InstanceNotFoundException:
>
> org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
>  at
>
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094)
> at
>
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:415)
>  at
>
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:403)
> at
>
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:507)
>  at
> org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:115)
> at
> org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:132)
>  at
>
> org.apache.zookeeper.server.ZooKeeperServer.unregisterJMX(ZooKeeperServer.java:443)
> at
>
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:436)
>  at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:271)
> at
>
> org.apache.zookeeper.server.ZooKeeperServerMain.shutdown(ZooKeeperServerMain.java:127)
>  at
>
> org.apache.blur.MiniCluster$ZooKeeperServerMainEmbedded.shutdown(MiniCluster.java:339)
> at org.apache.blur.MiniCluster.shutdownZooKeeper(MiniCluster.java:427)
>  at org.apache.blur.MiniCluster.shutdownBlurCluster(MiniCluster.java:146)
> at
>
> org.apache.blur.thrift.BlurClusterTest.shutdownCluster(BlurClusterTest.java:81)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at
>
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> at
>
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>  at
>
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
>  at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
> at
>
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
>  at
>
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
> at
>
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at
>
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> at
>
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
>  at
>
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> at
>
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)
>  at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)
>
>
> - Rahul
>
>
>
>
> On Tue, Apr 30, 2013 at 9:29 PM, rahul challapalli <
> [email protected]> wrote:
>
> > Aaron,
> >
> > Thanks for your reply. I will sure let you know how it goes.
> >
> > - Rahul
> >
> >
> > On Tue, Apr 30, 2013 at 7:33 PM, Aaron McCurry <[email protected]>
> wrote:
> >
> >> Hi Rahul,
> >>
> >> Welcome!  Blur is a young incubator project and with that there is not a
> >> lot of documentation.  Yet.  But we do have a lot of code.  :-)
> >>
> >> Blur uses HDFS for storing indexes, MapReduce for bulk indexing, Thrift
> >> for
> >> RPC and ZooKeeper for state, and of course Lucene for search.  Yes Blur
> >> can
> >> and should run along side a standard Hadoop install (MapReduce + HDFS).
> >>  It
> >> currently works with the 1.0.x version or CDH3 from Cloudera.  I'm sure
> we
> >> can get it to work with 2.0.x and CDH4, it just hasn't happen yet.
> >>  However
> >> the only dependency to run Blur on a single machine is ZooKeeper.  HDFS
> is
> >> required for a cluster.
> >>
> >> To get you started.
> >>
> >> git clone https://git-wip-us.apache.org/repos/asf/incubator-blur.git
> >>
> >> # we are currently focusing on getting 0.1.5 to a releasable state.
> >> git checkout 0.1.5
> >>
> >> In the checkout you will find a README.md that is a bit out of date with
> >> the code examples but the general theme is correct.  For more examples
> >> take
> >> a look at the blur-testsuite project, there are a lot of code examples
> in
> >> there to get you started.
> >>
> >> To build the project into a tarball that can be extracted and executed.
> >>
> >> run "mvn install" from the src/ directory.  Once it has successfully
> >> executed all the tests and built everything you will find a tar.gz file
> in
> >> the target/ directory in the distribution project.
> >>
> >> Before you can run Blur, Apache ZooKeeper needs to be running.  A
> default
> >> install will work.
> >>
> >> After extracting the Blur tar.gz file you should be able to run the
> >> bin/start-all.sh and it should start a Blur controller and a shard
> server
> >> on your local machine.
> >>
> >> I would love to hear how your initial compile and install goes, because
> we
> >> could use this thread and any information that is exchanged to create a
> >> nice little wiki page for 0.1.5.
> >>
> >> Thank!
> >>
> >> Aaron
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Apr 30, 2013 at 2:17 PM, rahul challapalli <
> >> [email protected]> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am new to blur and even ASF in terms of contributing back to a
> >> project. I
> >> > have decent knowledge about hadoop and mapreduce but completely new to
> >> > search. I come from a Java/PHP background. I  am looking for some
> >> direction
> >> > in setting up blur on my local machine. I have a single node hadoop
> >> > installation on my Mac OS X Lion. Is it an issue if I have HDFS,
> >> MapReduce
> >> > daemons running alongside blur on the same machine. I would greatly
> >> > appreciate if you can refer me to some setup document as well as an
> >> insight
> >> > into the architecture of blur. Thank You.
> >> >
> >> > - Rahul
> >> >
> >>
> >
> >
>

Re: New to Blur

Reply via email to