Getting the below error while creating a Table in blur. Any ideas? INFO 20130519_20:08:05:005_PDT [Commit Thread [words/shard-00000000]] writer.TransactionRecorder: Log roller took [52.565] for [IndexWriter with directory [HdfsDirectory path=[hdfs://localhost:9000/blur/tables/words/shard-00000000]]] INFO 20130519_20:08:08:008_PDT [thrift-processors5] server.ShardServerEventHandler: Method called INFO 20130519_20:08:08:008_PDT [thrift-processors0] thrift.TableAdmin: Opening - Shards Open [1], Shards Opening [0] of table [words] INFO 20130519_20:09:05:005_PDT [Commit Thread [words/shard-00000000]] writer.TransactionRecorder: Commit took [5.219] for [IndexWriter with directory [HdfsDirectory path=[hdfs://localhost:9000/blur/tables/words/shard-00000000]]] INFO 20130519_20:09:05:005_PDT [Commit Thread [words/shard-00000000]] writer.TransactionRecorder: Rolling WAL path [hdfs://localhost:9000/blur/tables/words/logs/shard-00000000] INFO 20130519_20:09:05:005_PDT [Commit Thread [words/shard-00000000]] writer.TransactionRecorder: Log roller took [7.706] for [IndexWriter with directory [HdfsDirectory path=[hdfs://localhost:9000/blur/tables/words/shard-00000000]]] INFO 20130519_20:10:05:005_PDT [Commit Thread [words/shard-00000000]] writer.TransactionRecorder: Commit took [5.394] for [IndexWriter with directory [HdfsDirectory path=[hdfs://localhost:9000/blur/tables/words/shard-00000000]]] INFO 20130519_20:10:05:005_PDT [Commit Thread [words/shard-00000000]] writer.TransactionRecorder: Rolling WAL path [hdfs://localhost:9000/blur/tables/words/logs/shard-00000000] ERROR 20130519_20:10:05:005_PDT [wal-sync-[words/shard-00000000]] writer.TransactionRecorder: Known error while trying to sync. java.io.IOException: DFSOutputStream is closed at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3669) at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) at org.apache.blur.manager.writer.TransactionRecorder.tryToSync(TransactionRecorder.java:301) at org.apache.blur.manager.writer.TransactionRecorder.tryToSync(TransactionRecorder.java:291) at org.apache.blur.manager.writer.TransactionRecorder.run(TransactionRecorder.java:380) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462)
On Tue, May 7, 2013 at 2:38 PM, Aaron McCurry <[email protected]> wrote: > That's good to hear! Thanks for letting us know. > > Aaron > > > On Tue, May 7, 2013 at 2:32 AM, rahul challapalli < > [email protected]> wrote: > > > Hi, > > > > I was able to run the code examples for loading data and searching. There > > were really no hiccups. I started digging through the code and will let > you > > know if I have any questions. Thanks > > > > - Rahul > > > > > > On Thu, May 2, 2013 at 11:33 PM, rahul challapalli < > > [email protected]> wrote: > > > > > Hi Aaron, > > > > > > I greatly appreciate your detailed response. I will go through the > notes, > > > code and the examples you provided over the weekend and will keep you > > > posted regarding any issues that I will come across. Once again thank > > you. > > > > > > - Rahul > > > > > > > > > On Thu, May 2, 2013 at 7:45 PM, Aaron McCurry <[email protected]> > > wrote: > > > > > >> Rahul, > > >> > > >> I'm glad you were able to get things built and Blur up and running! > > Good > > >> questions! Let me see if I can answer them. > > >> > > >> 1. I am not able to find the 'blur.*.hostname' properties in the > > >> blur.properties file, but these are listed in the readme file > > >> > > >> The blur-site.properties file overrides the blur-default.properties > file > > >> that can be found in src/blur-util/src/main/resources/ directory. > > >> > > >> 2. There seems to be a lot of code. I greatly appreciate if someone > can > > >> give me pointers before I dig through the codebase. Something like an > > >> architectural overview or a flow explaining how the search query is > > >> resolved. > > >> > > >> Good question. I will explain how a query is executing assuming you > are > > >> running Blur in a clustered environment (controllers + shards). > > >> > > >> -1. Client creates a query (BlurQuery) with the generated Thrift > > objects. > > >> -2. Client submits the query to one of the controllers by calling the > > >> query > > >> method on the Blur service. > > >> > > >> Note the easiest way to interact with Thrift in the client is by using > > >> BlurClient (org.apache.blur.thrift.BlurClient) in the blur-thrift > > project. > > >> And you can see it in use here (I just added it, so you might have to > > >> pull) > > >> > > >> > > >> > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-testsuite/src/main/java/org/apache/blur/testsuite/SimpleQueryExample.java;h=df16bd82522bb9083309ac31f849924ba43318be;hb=ff29884ab60262305b99af49fae26575c808b755 > > >> > > >> -3. Once the query arrives in the controller, the controller then > > >> re-submits the query to all the shard servers that are online. > > >> > > >> See > > >> > > >> > > > src/blur-core/src/main/java/org/apache/blur/thrift/BlurControllerServer.java > > >> query method. > > >> > > >> -4. Once the in shard server the query is then parsed into a Lucene > > query. > > >> -5. The query is executed in parallel, one thread per index shard in > the > > >> shard server. > > >> > > >> See > > >> > src/blur-core/src/main/java/org/apache/blur/thrift/BlurShardServer.java > > >> query method. > > >> > > >> -6. Once the results have been found from the query they are merged > and > > >> the > > >> top N are returned to the controller. > > >> -7. Once a the results from all the shard servers have returned the > top > > N > > >> are returned to the client. > > >> > > >> I know this is a technical explanation to running a single query, but > is > > >> should give you some starting points to dig through the code. > > >> > > >> The projects breakdown: > > >> > > >> blur-core > > >> - This project binds most of the other projects together, houses all > the > > >> thrift service impls, failover logic, server startup, shard and > > controller > > >> management, etc. > > >> blur-gui > > >> - An http status server that runs in each controller and shard server, > > >> needs some work. > > >> blur-mapred > > >> - The bulking indexing code lives in this project. > > >> blur-query > > >> - The lucene query classes that blur implements reside here. > > >> blur-shell > > >> - A basic shell program to interact with blur, needs some more > features. > > >> blur-store > > >> - The lucene directory and block cache code resides here. > > >> blur-testsuite > > >> - Current contains a lot of example programs to exercise a blur > cluster. > > >> blur-thrift > > >> - Contains generate thrift code and client code, the client code has > > >> automatic retry logic for when you are running multiple controllers, > > etc. > > >> blur-util > > >> - Contains some basic utility classes, metrics, and zookeeper code. > > >> > > >> > > >> 3. How do you guys manage your development workspace with eclipse, > git, > > >> and > > >> maven. This will definitely help me get a kickstart. > > >> > > >> I run git on the command line, with mvn and eclipse as my IDE. There > > are > > >> some shortcuts runs testing a single shard server, or a shard server + > > >> controller server from within eclipse. Take a look at the > > >> org.apache.blur.thrift.ThriftBlurShardServer and > > >> org.apache.blur.thrift.ThriftBlurControllerServer to the main methods > > that > > >> can be executed to run various processes. If you have ZooKeeper > running > > >> you should be able to run those mains and then step through a query > > being > > >> executed. > > >> > > >> 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are > the > > >> steps in actually using it. Where do we start? > > >> > > >> Take a look at > > >> > > >> > > > http://www.nearinfinity.com/blogs/aaron_mccurry/an_introduction_to_blur.htmlas > > >> well as the blur-testsuite. That project has some basic programs to > > >> create a table, load data, search, etc. And please follow up with > more > > >> questions if you need more guidance or help. > > >> > > >> Thanks for the notes about the initial setup and build! I will take a > > >> look > > >> at the errors. > > >> > > >> Aaron > > >> > > >> > > >> On Thu, May 2, 2013 at 1:42 AM, rahul challapalli < > > >> [email protected]> wrote: > > >> > > >> > Hi, > > >> > > > >> > I was able to get blur started (shards and controllers). It worked > > >> straight > > >> > away. Awesome. I have a few more questions. My apologies if some of > > the > > >> > questions are naive. > > >> > > > >> > 1. I am not able to find the 'blur.*.hostname' properties in the > > >> > blur.properties file, but these are listed in the readme file > > >> > 2. There seems to be a lot of code. I greatly appreciate if someone > > can > > >> > give me pointers before I dig through the codebase. Something like > an > > >> > architectural overview or a flow explaining how the search query is > > >> > resolved. > > >> > 3. How do you guys manage your development workspace with eclipse, > > git, > > >> and > > >> > maven. This will definitely help me get a kickstart. > > >> > 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are > > the > > >> > steps in actually using it. Where do we start? > > >> > > > >> > Also I am outlining the steps that I followed in getting blur to run > > and > > >> > also I got a couple of errors during the build process which are > also > > >> > listed below. The overall build was successful though. > > >> > > > >> > Apache Blur Single Node Setup on Mac OS X Lion > > >> > > > >> > 1. Environment : Single Node Hadoop-1.0.4 and Zookeeper-3.4.5 > > >> > 2. Get the Blur code from Git using git clone > > >> > https://git-wip-us.apache.org/repos/asf/incubator-blur.git > > >> > 3. Checkout the branch 0.1.5 > > >> > 4. Run 'mvn clean install' from the 'src' directory as superuser > > >> > 5. Extract the Blur tar.gz file from the 'target/' directory into a > > >> > convenient location and set BLUR_HOME to this location and add it to > > >> > .bash_profile > > >> > 6. Go to the extracted folder and configure the > > >> > $BLUR_HOME/config/blur-env.sh file. The two exports that are > > required: > > >> > export JAVA_HOME=$(/usr/libexec/java_home) > > >> > export HADOOP_HOME=/usr/local/hadoop > > >> > 7. Setup the $BLUR_HOME/config/blur.properties file. The default > site > > >> > configuration: > > >> > blur.zookeeper.connection=localhost > > >> > blur.cluster.name=default > > >> > 8. Start blur using $BLUR_HOME/bin/start-all.sh > > >> > > > >> > Errors during the build process : > > >> > > > >> > ERROR 20130430_22:47:42:042_PDT [IndexReader-Refresher] > > >> > writer.BlurIndexRefresher: Unknown error > > >> > org.apache.lucene.store.AlreadyClosedException: this Directory is > > closed > > >> > at org.apache.lucene.store.Directory.ensureOpen(Directory.java:256) > > >> > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) > > >> > at > > >> > > > >> > > > >> > > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679) > > >> > at > > >> > > > >> > > > >> > > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) > > >> > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) > > >> > at > > >> > > > >> > > > >> > > > org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326) > > >> > at > > >> > > > >> > > > >> > > > org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:284) > > >> > at > > >> > > > >> > > > >> > > > org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:247) > > >> > at > > >> > > > >> > > > >> > > > org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235) > > >> > at > > >> > > > >> > > > >> > > > org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169) > > >> > at > > >> > > > >> > > > >> > > > org.apache.blur.manager.writer.BlurIndexReader.refresh(BlurIndexReader.java:82) > > >> > at > > >> > > > >> > > > >> > > > org.apache.blur.manager.writer.BlurIndexRefresher.refreshInternal(BlurIndexRefresher.java:70) > > >> > at > > >> > > > >> > > > >> > > > org.apache.blur.manager.writer.BlurIndexRefresher.run(BlurIndexRefresher.java:61) > > >> > at java.util.TimerThread.mainLoop(Timer.java:512) > > >> > at java.util.TimerThread.run(Timer.java:462) > > >> > WARN 20130501_20:54:18:018_PDT [main] jmx.MBeanRegistry: Error > > during > > >> > unregister > > >> > javax.management.InstanceNotFoundException: > > >> > > > >> > > > >> > > > org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree > > >> > at > > >> > > > >> > > > >> > > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) > > >> > at > > >> > > > >> > > > >> > > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:415) > > >> > at > > >> > > > >> > > > >> > > > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:403) > > >> > at > > >> > > > >> > > > >> > > > com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:507) > > >> > at > > >> > > > >> > > org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:115) > > >> > at > > >> > > > >> > > org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:132) > > >> > at > > >> > > > >> > > > >> > > > org.apache.zookeeper.server.ZooKeeperServer.unregisterJMX(ZooKeeperServer.java:443) > > >> > at > > >> > > > >> > > > >> > > > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:436) > > >> > at > > >> > > > >> > > > >> > > > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:271) > > >> > at > > >> > > > >> > > > >> > > > org.apache.zookeeper.server.ZooKeeperServerMain.shutdown(ZooKeeperServerMain.java:127) > > >> > at > > >> > > > >> > > > >> > > > org.apache.blur.MiniCluster$ZooKeeperServerMainEmbedded.shutdown(MiniCluster.java:339) > > >> > at > org.apache.blur.MiniCluster.shutdownZooKeeper(MiniCluster.java:427) > > >> > at > > >> org.apache.blur.MiniCluster.shutdownBlurCluster(MiniCluster.java:146) > > >> > at > > >> > > > >> > > > >> > > > org.apache.blur.thrift.BlurClusterTest.shutdownCluster(BlurClusterTest.java:81) > > >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > >> > at > > >> > > > >> > > > >> > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > >> > at > > >> > > > >> > > > >> > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > >> > at java.lang.reflect.Method.invoke(Method.java:597) > > >> > at > > >> > > > >> > > > >> > > > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > > >> > at > > >> > > > >> > > > >> > > > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > > >> > at > > >> > > > >> > > > >> > > > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > > >> > at > > >> > > > >> > > > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37) > > >> > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > > >> > at > > >> > > > >> > > > >> > > > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236) > > >> > at > > >> > > > >> > > > >> > > > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134) > > >> > at > > >> > > > >> > > > >> > > > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113) > > >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > >> > at > > >> > > > >> > > > >> > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > >> > at > > >> > > > >> > > > >> > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > >> > at java.lang.reflect.Method.invoke(Method.java:597) > > >> > at > > >> > > > >> > > > >> > > > org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) > > >> > at > > >> > > > >> > > > >> > > > org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) > > >> > at > > >> > > > >> > > > >> > > > org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) > > >> > at > > >> > > > >> > > > >> > > > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103) > > >> > at > > >> > > > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74) > > >> > > > >> > > > >> > - Rahul > > >> > > > >> > > > >> > > > >> > > > >> > On Tue, Apr 30, 2013 at 9:29 PM, rahul challapalli < > > >> > [email protected]> wrote: > > >> > > > >> > > Aaron, > > >> > > > > >> > > Thanks for your reply. I will sure let you know how it goes. > > >> > > > > >> > > - Rahul > > >> > > > > >> > > > > >> > > On Tue, Apr 30, 2013 at 7:33 PM, Aaron McCurry < > [email protected]> > > >> > wrote: > > >> > > > > >> > >> Hi Rahul, > > >> > >> > > >> > >> Welcome! Blur is a young incubator project and with that there > is > > >> not a > > >> > >> lot of documentation. Yet. But we do have a lot of code. :-) > > >> > >> > > >> > >> Blur uses HDFS for storing indexes, MapReduce for bulk indexing, > > >> Thrift > > >> > >> for > > >> > >> RPC and ZooKeeper for state, and of course Lucene for search. > Yes > > >> Blur > > >> > >> can > > >> > >> and should run along side a standard Hadoop install (MapReduce + > > >> HDFS). > > >> > >> It > > >> > >> currently works with the 1.0.x version or CDH3 from Cloudera. > I'm > > >> sure > > >> > we > > >> > >> can get it to work with 2.0.x and CDH4, it just hasn't happen > yet. > > >> > >> However > > >> > >> the only dependency to run Blur on a single machine is ZooKeeper. > > >> HDFS > > >> > is > > >> > >> required for a cluster. > > >> > >> > > >> > >> To get you started. > > >> > >> > > >> > >> git clone > > https://git-wip-us.apache.org/repos/asf/incubator-blur.git > > >> > >> > > >> > >> # we are currently focusing on getting 0.1.5 to a releasable > state. > > >> > >> git checkout 0.1.5 > > >> > >> > > >> > >> In the checkout you will find a README.md that is a bit out of > date > > >> with > > >> > >> the code examples but the general theme is correct. For more > > >> examples > > >> > >> take > > >> > >> a look at the blur-testsuite project, there are a lot of code > > >> examples > > >> > in > > >> > >> there to get you started. > > >> > >> > > >> > >> To build the project into a tarball that can be extracted and > > >> executed. > > >> > >> > > >> > >> run "mvn install" from the src/ directory. Once it has > > successfully > > >> > >> executed all the tests and built everything you will find a > tar.gz > > >> file > > >> > in > > >> > >> the target/ directory in the distribution project. > > >> > >> > > >> > >> Before you can run Blur, Apache ZooKeeper needs to be running. A > > >> > default > > >> > >> install will work. > > >> > >> > > >> > >> After extracting the Blur tar.gz file you should be able to run > the > > >> > >> bin/start-all.sh and it should start a Blur controller and a > shard > > >> > server > > >> > >> on your local machine. > > >> > >> > > >> > >> I would love to hear how your initial compile and install goes, > > >> because > > >> > we > > >> > >> could use this thread and any information that is exchanged to > > >> create a > > >> > >> nice little wiki page for 0.1.5. > > >> > >> > > >> > >> Thank! > > >> > >> > > >> > >> Aaron > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> On Tue, Apr 30, 2013 at 2:17 PM, rahul challapalli < > > >> > >> [email protected]> wrote: > > >> > >> > > >> > >> > Hi, > > >> > >> > > > >> > >> > I am new to blur and even ASF in terms of contributing back to > a > > >> > >> project. I > > >> > >> > have decent knowledge about hadoop and mapreduce but completely > > >> new to > > >> > >> > search. I come from a Java/PHP background. I am looking for > some > > >> > >> direction > > >> > >> > in setting up blur on my local machine. I have a single node > > hadoop > > >> > >> > installation on my Mac OS X Lion. Is it an issue if I have > HDFS, > > >> > >> MapReduce > > >> > >> > daemons running alongside blur on the same machine. I would > > greatly > > >> > >> > appreciate if you can refer me to some setup document as well > as > > an > > >> > >> insight > > >> > >> > into the architecture of blur. Thank You. > > >> > >> > > > >> > >> > - Rahul > > >> > >> > > > >> > >> > > >> > > > > >> > > > > >> > > > >> > > > > > > > > >
