Sure, you may separate the logic as you want it to be, but just ensure the configuration object has a proper setJar or setJarByClass done on it before you submit the job.
On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu <manoj...@gmail.com> wrote: > Hi Harsh, > > Thanks for your reply. > > Consider from my main program i am doing so many > activities(Reading/writing/updating non hadoop activities) before invoking > JobClient.runJob(conf); > Is it anyway to separate the process flow by programmatic instead of going > for any workflow engine? > > Cheers! > Manoj. > > > > On Mon, Aug 13, 2012 at 4:10 PM, Harsh J <ha...@cloudera.com> wrote: >> >> Hi Manoj, >> >> Reply inline. >> >> On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu <manoj...@gmail.com> wrote: >> > Hi All, >> > >> > Normal Hadoop job submission process involves: >> > >> > Checking the input and output specifications of the job. >> > Computing the InputSplits for the job. >> > Setup the requisite accounting information for the DistributedCache of >> > the >> > job, if necessary. >> > Copying the job's jar and configuration to the map-reduce system >> > directory >> > on the distributed file-system. >> > Submitting the job to the JobTracker and optionally monitoring it's >> > status. >> > >> > I have a doubt in 4th point of job execution flow could any of you >> > explain >> > it? >> > >> > What is job's jar? >> >> The job.jar is the jar you supply via "hadoop jar <jar>". Technically >> though, it is the jar pointed by JobConf.getJar() (Set via setJar or >> setJarByClass calls). >> >> > Is it job's jar is the one we submitted to hadoop or hadoop will build >> > based >> > on the job configuration object? >> >> It is the former, as explained above. >> >> -- >> Harsh J > > -- Harsh J