Re: New to Search and Blur

Aaron McCurry Thu, 28 Feb 2013 11:00:55 -0800

Hey Paul I will try to answer your question below:


On Thu, Feb 28, 2013 at 6:10 AM, Paul O'Donoghue <[email protected]> wrote:

> Hi Aaron, Garrett,
>
> I’ve set up a new CDH3 cluster on clean installs of CentOS. I set up HDFS,
> MapReduce, Hive and Zookeeper, then I cloned the blut 0.2 repo and built
> using maven.
>
> After setting HADOOP_HOME and JAVA_HOME in conf/blur-env.sh, and replacing
> localhost in conf/servers with my actual host name. I then ran
> bin/start-all.sh, followed by bin/blur shell <host>:40020. This starts the
> shell with no errors.
>

I believe that if you have start Blur on the server that you are currently
running the shell on then the connection string would be:

bin/blur shell localhost:40020

Or replace localhost with your server name, if you are actually placing the
localhost or server in place of <host> then please ignore.

Also check to make sure the server is actually running by looking at the
log in the logs directory.  A quick check is to re-run the bin/start-all.sh
script, it will tell you that the process already exists (meaning that it's
running) or that it started the process (meaning that is was not).


>
> When I ran the basic create command (not using hdfs) I get the following.
> Note that I added in some extra debug code (System.out and
> printStackTrace).
>
> blur> create testTableName file:///data/testTableName 1
> <hostname>:40020
> java.net.ConnectException: Connection refused
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>         at
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>         at java.net.Socket.connect(Socket.java:529)
>         at java.net.Socket.connect(Socket.java:478)
>         at
>
> org.apache.blur.thrift.BlurClientManager.newClient(BlurClientManager.java:342)
>         at
>
> org.apache.blur.thrift.BlurClientManager.getClient(BlurClientManager.java:320)
>         at
>
> org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:165)
>         at
>
> org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler.invoke(BlurClient.java:54)
>         at $Proxy0.createTable(Unknown Source)
>         at org.apache.blur.shell.CreateCommand.doit(CreateCommand.java:94)
>         at org.apache.blur.shell.Main.main(Main.java:201)
> <hostname>:40020
> <hostname>:40020
> <hostname>:40020
> <hostname>:40020
> java.lang.reflect.UndeclaredThrowableException
>         at $Proxy0.createTable(Unknown Source)
>         at org.apache.blur.shell.CreateCommand.doit(CreateCommand.java:94)
>         at org.apache.blur.shell.Main.main(Main.java:201)
> Caused by: java.io.IOException: All connections are bad.
>         at
>
> org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:203)
>         at
>
> org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler.invoke(BlurClient.java:54)
>         ... 3 more
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
>         at $Proxy0.createTable(Unknown Source)
>         at org.apache.blur.shell.CreateCommand.doit(CreateCommand.java:94)
>         at org.apache.blur.shell.Main.main(Main.java:201)
> Caused by: java.io.IOException: All connections are bad.
>         at
>
> org.apache.blur.thrift.BlurClientManager.execute(BlurClientManager.java:203)
>         at
>
> org.apache.blur.thrift.BlurClient$BlurClientInvocationHandler.invoke(BlurClient.java:54)
>         ... 3 more
>
> I’m guessing I have forgot to do something completely obvious, but I would
> really appreciate your help.
>
> Paul.
>
>  *From:* Aaron McCurry <[email protected]>
> *Sent:* 22 February 2013 14:36
> *To:* Paul O'Donoghue <[email protected]>
> *Subject:* Re: New to Search and Blur
>
> Yes you are correct.  I have not tested with CDH4, however if you want to
> try you should be able to remove the hadoop-1.0.x jar from the blur lib
> folder and see if the cdh4 jars work.  If the HDFS api is the same, it
> should work.  However only MR 1.0 will work with the blur-mapred project.
> We should add an issue to get blur to work with CDH4/Hadoop 2.0.x.
>
> Aaron
>
>
> On Thu, Feb 21, 2013 at 1:35 PM, Paul O'Donoghue <[email protected]>
> wrote:
>
> Hi Aaron,
>
> When running this command I get the following error.
>
> blur> create repl-table hdfs://namenode.domain.com:8020/blur/repl-table 1
> org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot
> communicate with client version 4
>
> Would this be related to the fact that I am running CDH4?
>
> Paul.
>
>  *From:* Aaron McCurry <[email protected]>
> *Sent:* 20 February 2013 19:23
> *To:* [email protected]
> *Subject:* Re: New to Search and Blur
>
> Welcome Paul!  I will try to answer your questions below:
>
>
> On Wed, Feb 20, 2013 at 1:41 PM, Paul O'Donoghue <[email protected]>
> wrote:
>
> > Hi,
> >
> > First up I would like to say I’m really excited by the Blur project, it
> > seems to fit the need of a potential project perfectly. I’m hoping that I
> > can someday contribute back to this project in some way as it seems that
> it
> > will be of enormous help to me.
> >
> > Now, on to the meat of the issue. I’m a complete search newbie. I am
> coming
> > from a Spring/Application development background but have to get involved
> > in the Search/Big data field for a current client. Since the new year I
> > have been looking at Hadoop and have setup a small cluster using
> Cloudera’s
> > excellent tools. I’ve been downloading datasets, running MR jobs, etc.
> and
> > think I have gleaned a very basic level of knowledge which is enough for
> me
> > to learn more when I need it. This week I have started looking at Blur,
> and
> > at present I have cloned the src to the hadoop namenode where I have
> built
> > and started the blur servers. But now I am stuck, and don’t know where to
> > go. So I will ask the following
> >
> > 1 - /apache-blur-0.2.0-SNAPSHOT/conf/servers. At present I just have my
> > namenode defined in here. Do I need to add my datanodes as well?
> >
>
> So you don't have to but the normal configuration would be to run blur
> along side the datanodes.  Which means you will have to copy the SNAPSHOT
> directory to all the datanodes as well as adding all the datanodes to the
> servers file.  However if you want to start simple then you could just run
> blur on a single node, the namenode could work.  Just to be clear, I would
> not recommend running Blur on the same machine as your namenode in a
> production environment, but for testing it should be fine.  I would however
> put the name of your server in servers file and remove localhost.
>
>
> >
> > 2 - blur> create repl-table hdfs://localhost:9000/blur/repl-table 1
> > java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on
> > connection exception: java.net.ConnectException: Connection refused.
> >
>
> > I’m confused here. Is 9000 the correct port? Is there some sort of user
> > auth issue?
> >
>
>
> I would change the command to be "create repl-table"
> hdfs://<namenode>/blur/repl-table 1
>
> The <namenode> should be as the fs.default.name in your core-site.xml in
> the hadoop/conf directory.
>
>
> >
> > 3 - Assuming I create a table on the hdfs, when I want to import my data
> > into it I use a MR job yes? What is the best way to package this job? Do
> I
> > have to include all the Blur jars or do I install Blur on the datanodes
> and
> > set a classpath? Is it possible to link to an example MR job in a maven
> > project? Or am I on completely the wrong track.
> >
>
> You are on the right track, however you won't need to package up the jar
> files across the cluster.  We haven't built a nice automated way to run map
> reduce jobs but this is what you need to do.
>
> Take a look at:
>
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-mapred/src/main/java/org/apache/blur/example/BlurExampleIndexWriter.java;h=9d6eb546e565303f328556fea29d1345344e8065;hb=0.2-dev
>
> This is a writing example in the new blur code (0.2.x), there is also a
> reading example in the same package.  This example actually pushes the
> updates through the thrift API, the bulk importer that writes indexes
> directly to HDFS has not been rewritten for 0.2 yet.
>
> As for the blur libraries, you can use the simple approach of putting all
> the jars in a lib folder and creating a single jar including your classes
> and the lib folder (jars inside the jar).  Hadoop understands that the lib
> folder in the jar file is to be added to the classpath of the running
> tasks.  Thus it will automatically distribute the libraries on to the
> Hadoop MR cluster.
>
> Let us know if you have more questions and how we can help.  Thanks!
>
> Aaron
>
>
> >
> > Thanks for your help,
> >
> > Paul.
> >
>

Re: New to Search and Blur

Reply via email to