Hey Paul I will try to answer your question below:
On Thu, Feb 28, 2013 at 6:10 AM, Paul O'Donoghue <[email protected]> wrote: > Hi Aaron, Garrett, > > I’ve set up a new CDH3 cluster on clean installs of CentOS. I set up HDFS, > MapReduce, Hive and Zookeeper, then I cloned the blut 0.2 repo and built > using maven. > > After setting HADOOP_HOME and JAVA_HOME in conf/blur-env.sh, and replacing > localhost in conf/servers with my actual host name. I then ran > bin/start-all.sh, followed by bin/blur shell <host>:40020. This starts the > shell with no errors. > I believe that if you have start Blur on the server that you are currently running the shell on then the connection string would be: bin/blur shell localhost:40020 Or replace localhost with your server name, if you are actually placing the localhost or server in place of <host> then please ignore. Also check to make sure the server is actually running by looking at the log in the logs directory. A quick check is to re-run the bin/start-all.sh script, it will tell you that the process already exists (meaning that it's running) or that it started the process (meaning that is was not). > > When I ran the basic create command (not using hdfs) I get the following. > Note that I added in some extra debug code (System.out and > printStackTrace). > > blur> create testTableName file:///data/testTableName 1 > <hostname>:40020 > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) > at > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) > at java.net.Socket.connect(Socket.java:529) > at java.net.Socket.connect(Socket.java:478) > at > > org.apache.blur.thrift.BlurClientManager.newClient(BlurClientManager.java:342) > at > > org.apache.blur.thrift.BlurClientManager.getClient(BlurClientManager.java:320) > at > > org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:165) > at > > org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler.invoke(BlurClient.java:54) > at $Proxy0.createTable(Unknown Source) > at org.apache.blur.shell.CreateCommand.doit(CreateCommand.java:94) > at org.apache.blur.shell.Main.main(Main.java:201) > <hostname>:40020 > <hostname>:40020 > <hostname>:40020 > <hostname>:40020 > java.lang.reflect.UndeclaredThrowableException > at $Proxy0.createTable(Unknown Source) > at org.apache.blur.shell.CreateCommand.doit(CreateCommand.java:94) > at org.apache.blur.shell.Main.main(Main.java:201) > Caused by: java.io.IOException: All connections are bad. > at > > org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:203) > at > > org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler.invoke(BlurClient.java:54) > ... 3 more > Exception in thread "main" java.lang.reflect.UndeclaredThrowableException > at $Proxy0.createTable(Unknown Source) > at org.apache.blur.shell.CreateCommand.doit(CreateCommand.java:94) > at org.apache.blur.shell.Main.main(Main.java:201) > Caused by: java.io.IOException: All connections are bad. > at > > org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:203) > at > > org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler.invoke(BlurClient.java:54) > ... 3 more > > I’m guessing I have forgot to do something completely obvious, but I would > really appreciate your help. > > Paul. > > *From:* Aaron McCurry <[email protected]> > *Sent:* 22 February 2013 14:36 > *To:* Paul O'Donoghue <[email protected]> > *Subject:* Re: New to Search and Blur > > Yes you are correct. I have not tested with CDH4, however if you want to > try you should be able to remove the hadoop-1.0.x jar from the blur lib > folder and see if the cdh4 jars work. If the HDFS api is the same, it > should work. However only MR 1.0 will work with the blur-mapred project. > We should add an issue to get blur to work with CDH4/Hadoop 2.0.x. > > Aaron > > > On Thu, Feb 21, 2013 at 1:35 PM, Paul O'Donoghue <[email protected]> > wrote: > > Hi Aaron, > > When running this command I get the following error. > > blur> create repl-table hdfs://namenode.domain.com:8020/blur/repl-table 1 > org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot > communicate with client version 4 > > Would this be related to the fact that I am running CDH4? > > Paul. > > *From:* Aaron McCurry <[email protected]> > *Sent:* 20 February 2013 19:23 > *To:* [email protected] > *Subject:* Re: New to Search and Blur > > Welcome Paul! I will try to answer your questions below: > > > On Wed, Feb 20, 2013 at 1:41 PM, Paul O'Donoghue <[email protected]> > wrote: > > > Hi, > > > > First up I would like to say I’m really excited by the Blur project, it > > seems to fit the need of a potential project perfectly. I’m hoping that I > > can someday contribute back to this project in some way as it seems that > it > > will be of enormous help to me. > > > > Now, on to the meat of the issue. I’m a complete search newbie. I am > coming > > from a Spring/Application development background but have to get involved > > in the Search/Big data field for a current client. Since the new year I > > have been looking at Hadoop and have setup a small cluster using > Cloudera’s > > excellent tools. I’ve been downloading datasets, running MR jobs, etc. > and > > think I have gleaned a very basic level of knowledge which is enough for > me > > to learn more when I need it. This week I have started looking at Blur, > and > > at present I have cloned the src to the hadoop namenode where I have > built > > and started the blur servers. But now I am stuck, and don’t know where to > > go. So I will ask the following > > > > 1 - /apache-blur-0.2.0-SNAPSHOT/conf/servers. At present I just have my > > namenode defined in here. Do I need to add my datanodes as well? > > > > So you don't have to but the normal configuration would be to run blur > along side the datanodes. Which means you will have to copy the SNAPSHOT > directory to all the datanodes as well as adding all the datanodes to the > servers file. However if you want to start simple then you could just run > blur on a single node, the namenode could work. Just to be clear, I would > not recommend running Blur on the same machine as your namenode in a > production environment, but for testing it should be fine. I would however > put the name of your server in servers file and remove localhost. > > > > > > 2 - blur> create repl-table hdfs://localhost:9000/blur/repl-table 1 > > java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on > > connection exception: java.net.ConnectException: Connection refused. > > > > > I’m confused here. Is 9000 the correct port? Is there some sort of user > > auth issue? > > > > > I would change the command to be "create repl-table" > hdfs://<namenode>/blur/repl-table 1 > > The <namenode> should be as the fs.default.name in your core-site.xml in > the hadoop/conf directory. > > > > > > 3 - Assuming I create a table on the hdfs, when I want to import my data > > into it I use a MR job yes? What is the best way to package this job? Do > I > > have to include all the Blur jars or do I install Blur on the datanodes > and > > set a classpath? Is it possible to link to an example MR job in a maven > > project? Or am I on completely the wrong track. > > > > You are on the right track, however you won't need to package up the jar > files across the cluster. We haven't built a nice automated way to run map > reduce jobs but this is what you need to do. > > Take a look at: > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-mapred/src/main/java/org/apache/blur/example/BlurExampleIndexWriter.java;h=9d6eb546e565303f328556fea29d1345344e8065;hb=0.2-dev > > This is a writing example in the new blur code (0.2.x), there is also a > reading example in the same package. This example actually pushes the > updates through the thrift API, the bulk importer that writes indexes > directly to HDFS has not been rewritten for 0.2 yet. > > As for the blur libraries, you can use the simple approach of putting all > the jars in a lib folder and creating a single jar including your classes > and the lib folder (jars inside the jar). Hadoop understands that the lib > folder in the jar file is to be added to the classpath of the running > tasks. Thus it will automatically distribute the libraries on to the > Hadoop MR cluster. > > Let us know if you have more questions and how we can help. Thanks! > > Aaron > > > > > > Thanks for your help, > > > > Paul. > > >
