Re: Programatically initializing and starting HDFS cluster

Chris Collins Thu, 12 Jun 2008 07:56:24 -0700

I am also interested about this option, since I will probably behacking at such a thing in the next few weeks.

I am also curious if you can run MR jobs within process rather thanlaunching each time. The scenario is when initialization takes justway too long for a map reduce shard to be executed in this model. Forexample, say you are trying to compute the top n terms within a set ofdocuments where top n is those top rarest terms in some model corpus,perhaps you have a df index, or perhaps you have a huge nlp enginethats used for entity extraction, any of these assume a chunk ofmemory and a chunk of time to init each pass.

Here of course you really would need not only to specify the job, butsomehow constrain the candidate nodes this can run on based upon theirability to run this.


C

On Jun 12, 2008, at 2:02 AM, Robert Krüger wrote:

Hi,
for our developers I would like to write a few lines of Java codethat, given a base directory, sets up an HDFS filesystem,initializes it, if it is not there yet and then starts theservice(s) in process. This is to run on each developer's machine,probably within a tomcat instance. I don't want to do this (if Idon't have to) in a bunch of shell scripts.
Could anyone point to code samples that do similar things or giveany other hints that make this easier than to look at what theCommand line tools do and reverse engineer it from there?
Thanks in advance,

Robert

Re: Programatically initializing and starting HDFS cluster

Reply via email to