Re: doubt on Hadoop job submission process

Harsh J Mon, 13 Aug 2012 04:58:23 -0700

Sure, you may separate the logic as you want it to be, but just ensure
the configuration object has a proper setJar or setJarByClass done on
it before you submit the job.


On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu <manoj...@gmail.com> wrote:
> Hi Harsh,
>
> Thanks for your reply.
>
> Consider from my main program i am doing so many
> activities(Reading/writing/updating non hadoop activities) before invoking
> JobClient.runJob(conf);
> Is it anyway to separate the process flow by programmatic instead of going
> for any workflow engine?
>
> Cheers!
> Manoj.
>
>
>
> On Mon, Aug 13, 2012 at 4:10 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Manoj,
>>
>> Reply inline.
>>
>> On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu <manoj...@gmail.com> wrote:
>> > Hi All,
>> >
>> > Normal Hadoop job submission process involves:
>> >
>> > Checking the input and output specifications of the job.
>> > Computing the InputSplits for the job.
>> > Setup the requisite accounting information for the DistributedCache of
>> > the
>> > job, if necessary.
>> > Copying the job's jar and configuration to the map-reduce system
>> > directory
>> > on the distributed file-system.
>> > Submitting the job to the JobTracker and optionally monitoring it's
>> > status.
>> >
>> > I have a doubt in 4th point of  job execution flow could any of you
>> > explain
>> > it?
>> >
>> > What is job's jar?
>>
>> The job.jar is the jar you supply via "hadoop jar <jar>". Technically
>> though, it is the jar pointed by JobConf.getJar() (Set via setJar or
>> setJarByClass calls).
>>
>> > Is it job's jar is the one we submitted to hadoop or hadoop will build
>> > based
>> > on the job configuration object?
>>
>> It is the former, as explained above.
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: doubt on Hadoop job submission process

Reply via email to