[
https://issues.apache.org/jira/browse/HADOOP-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677464#action_12677464
]
Doug Cutting commented on HADOOP-5350:
--------------------------------------
If a job submission is to persist, then we must write its data to the system
directory, no?
We could perhaps streamline things somewhat by sending the job.xml and splits
directly to the jobtracker via RPC, and having it persist these. They'd still
need to be written before the job could be started, but they'd no longer need
to also be read. The job's jar file should probably continue to be written by
the client, since it is not needed by the jobtracker. I'm not sure this would
really help things much, however.
> Submitting job information via DFS in Map/Reduce causing consistency and
> performance issues
> -------------------------------------------------------------------------------------------
>
> Key: HADOOP-5350
> URL: https://issues.apache.org/jira/browse/HADOOP-5350
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Hemanth Yamijala
>
> Job submission involves two steps: submitting jobs to the System directory on
> DFS (done by the client), then submit the job via the JobSubmissionProtocol
> to JobTracker. This two step process is seen to have some issues:
> - Since the files need to be read from DFS, slowness in the DFS can cause job
> initialization to become costly. We faced this as described in HADOOP-5286
> and HADOOP-4664.
> - The two step process could lead to inconsistent information being left
> around - like in HADOOP-5327 and HADOOP-5335.
> This JIRA is to explore options to remove the two step process in submitting
> a job.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.