Paul,
 That sounds correct. I'd imagine CDM put the JobTracker, Zookeeper and
Hbase Master on the NameNode as well?  You can run the controller from the
namenode since the cluster is small (as long as your vm's have decent ram).
 Now since you mentioned CDH4 you raise a question for Aaron...

Aaron, have you gotten Blur 0.2 working on CDH4?  I had to add multiple
hadoop_home paths to 0.1.x blur-env.sh and blur-config.sh to include
hadoop, hadoop-hdfs and hadoop-0.20-mapreduce into the classpath to make
mine work.

~Garrett


On Thu, Feb 21, 2013 at 7:43 AM, Paul O'Donoghue <[email protected]> wrote:

> Thanks Aaron & Garrett,
>
> I’ve changed the exception I’m getting, which I always regard as progress!
>
> Before I go off on a tangent though I’ll just describe my current setup,
> and what I think I need to do next.
>
> - I have setup a cluster of 3 virtual machines each running centOS.
> - Using cloudera manager I installed CDH4 on the three machines.
> - I installed the following hdfs, hbase, mapreduce, hive, hue, oozie and
> zookeeper. All services appear to be running correctly.
> - I have 1 Namenode and 2 Datanodes.
>
> Now, on my name node I have downloaded and compiled Blur 0.2. I have copied
> the snapshot dist to my namenode and 1 datanode. I will be copying it to
> the last datanode soon. Even though I have it setup on the namenode based
> on your advice I will not run it from there.
>
> On each of the datanodes I must set the JDK and Hadoop classpaths in the
> blur-env.sh.  I will also setup passwordless authentication for ssh on all
> machines. Finally I will add the datanode addresses to both conf/server
> files.
>
> I will also setup a shard server on both datanodes.
>
>
> Is this the correct setup to start creating indexes on hdfs?
>
> Once again many thanks for your help.
>
> Paul.
>
>
>  *From:* Aaron McCurry <[email protected]>
> *Sent:* 20 February 2013 19:23
> *To:* [email protected]
> *Subject:* Re: New to Search and Blur
>
> Welcome Paul!  I will try to answer your questions below:
>
>
> On Wed, Feb 20, 2013 at 1:41 PM, Paul O'Donoghue <[email protected]>
> wrote:
>
> > Hi,
> >
> > First up I would like to say I’m really excited by the Blur project, it
> > seems to fit the need of a potential project perfectly. I’m hoping that I
> > can someday contribute back to this project in some way as it seems that
> it
> > will be of enormous help to me.
> >
> > Now, on to the meat of the issue. I’m a complete search newbie. I am
> coming
> > from a Spring/Application development background but have to get involved
> > in the Search/Big data field for a current client. Since the new year I
> > have been looking at Hadoop and have setup a small cluster using
> Cloudera’s
> > excellent tools. I’ve been downloading datasets, running MR jobs, etc.
> and
> > think I have gleaned a very basic level of knowledge which is enough for
> me
> > to learn more when I need it. This week I have started looking at Blur,
> and
> > at present I have cloned the src to the hadoop namenode where I have
> built
> > and started the blur servers. But now I am stuck, and don’t know where to
> > go. So I will ask the following
> >
> > 1 - /apache-blur-0.2.0-SNAPSHOT/conf/servers. At present I just have my
> > namenode defined in here. Do I need to add my datanodes as well?
> >
>
> So you don't have to but the normal configuration would be to run blur
> along side the datanodes.  Which means you will have to copy the SNAPSHOT
> directory to all the datanodes as well as adding all the datanodes to the
> servers file.  However if you want to start simple then you could just run
> blur on a single node, the namenode could work.  Just to be clear, I would
> not recommend running Blur on the same machine as your namenode in a
> production environment, but for testing it should be fine.  I would however
> put the name of your server in servers file and remove localhost.
>
>
> >
> > 2 - blur> create repl-table hdfs://localhost:9000/blur/repl-table 1
> > java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on
> > connection exception: java.net.ConnectException: Connection refused.
> >
>
> > I’m confused here. Is 9000 the correct port? Is there some sort of user
> > auth issue?
> >
>
>
> I would change the command to be "create repl-table"
> hdfs://<namenode>/blur/repl-table 1
>
> The <namenode> should be as the fs.default.name in your core-site.xml in
> the hadoop/conf directory.
>
>
> >
> > 3 - Assuming I create a table on the hdfs, when I want to import my data
> > into it I use a MR job yes? What is the best way to package this job? Do
> I
> > have to include all the Blur jars or do I install Blur on the datanodes
> and
> > set a classpath? Is it possible to link to an example MR job in a maven
> > project? Or am I on completely the wrong track.
> >
>
> You are on the right track, however you won't need to package up the jar
> files across the cluster.  We haven't built a nice automated way to run map
> reduce jobs but this is what you need to do.
>
> Take a look at:
>
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-mapred/src/main/java/org/apache/blur/example/BlurExampleIndexWriter.java;h=9d6eb546e565303f328556fea29d1345344e8065;hb=0.2-dev
>
> This is a writing example in the new blur code (0.2.x), there is also a
> reading example in the same package.  This example actually pushes the
> updates through the thrift API, the bulk importer that writes indexes
> directly to HDFS has not been rewritten for 0.2 yet.
>
> As for the blur libraries, you can use the simple approach of putting all
> the jars in a lib folder and creating a single jar including your classes
> and the lib folder (jars inside the jar).  Hadoop understands that the lib
> folder in the jar file is to be added to the classpath of the running
> tasks.  Thus it will automatically distribute the libraries on to the
> Hadoop MR cluster.
>
> Let us know if you have more questions and how we can help.  Thanks!
>
> Aaron
>
>
> >
> > Thanks for your help,
> >
> > Paul.
> >
>

Reply via email to