Re: Distributing our jars to all machines in a cluster

Bejoy Ks Wed, 16 Nov 2011 03:24:07 -0800

Hi
       To distribute application specific jars or files you can just do the
same with 'hadoop jar command' like
*hadoop jar* sample.jar com.test.Samples.Application *-files* *file1.txt,
file2.csv* *-libjars* *custom_connector.jar, json_util.jar* input_dir
output_dir. But this would happen for every time the job is run, if the job
is more frequent, there are more number of jars to distribute and there are
multiple jobs that would depend on the same jars then rather then
distributing the jars every time you trigger the job it is better to pre
distribute the same across your nodes and include the same in classpath of
all the nodes
        AFAIK you dont use "hadoop job" to submit your MR job. It is used
for playing around with your job(like setting priorities,killing,
monitoring status etc) once your job is registered with job tracker(
running jobs).


Hope it helps!...

Regards
Bejoy.K.S

On Wed, Nov 16, 2011 at 12:09 PM, Something Something <
mailinglist...@gmail.com> wrote:

> Until now we were manually copying our Jars to all machines in a Hadoop
> cluster.  This used to work until our cluster size was small.  Now our
> cluster is getting bigger.  What's the best way to start a Hadoop Job that
> automatically distributes the Jar to all machines in a cluster?
>
> I read the doc at:
> http://hadoop.apache.org/common/docs/current/commands_manual.html#jar
>
> Would -libjars do the trick?  But we need to use 'hadoop job' for that,
> right?  Until now, we were using 'hadoop jar' to start all our jobs.
>
> Needless to say, we are getting our feet wet with Hadoop, so appreciate
> your help with our dumb questions.
>
> Thanks.
>
> PS:  We use Pig a lot, which automatically does this, so there must be a
> clean way to do this.
>
>

Re: Distributing our jars to all machines in a cluster

Reply via email to