[
https://issues.apache.org/jira/browse/HADOOP-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591950#action_12591950
]
Enis Soztutar commented on HADOOP-544:
--------------------------------------
Owen, thanks for the jdiff, I haven't though of that.
I checked the differences in the public APIs :
* methods in CompletedJobStatusStore is changed w/o deprecation, but I think
this class should be package private rather than public. Its methods will be
accessed though JobClient leveraging facade design pattern, no?
* API changes in JobClient, RunningJob, TaskCompletionEvent, TaskReport
deprecates old ones, no problem here.
* API changes in JobHistory : I am not sure if JobHistory is really public.
There are HistoryViewer, jsp pages etc to view/parse job history so I do not
know if it is actually used outside hadoop. If you think it is, then we can fix
the patch to keep the old methods as well.
* Although public, JobProfile, JobStatus, jobSubmissionProtocol(see
HADOOP-1643), MapTaskStatus, and ReduceTaskStatus are never directly exposed to
the user so the changes in them are acceptable I think.
* methods in JT and TT, not sure if we need to change them, if so we can fix
this.
The name sets {jobid, taskinprogressid, taskid} and {jobid, taskid,
taskattemptid} is already confusing (sometimes even I get confused. ) I think
the ultimate naming should be the latter, which makes more sense. However we
have been using and exposing both naming schema. For example the task names
start with tip_ or task_, we have JobClient#killTask(taskid) which accepts a
task attempt id, and such, on the other hand JobHistory uses
taskID-taskAttemptID, so the current situation is already inconsistent.
I suspect if we introduce JobID, TaskID, TaskAttemptID as a replacement of the
deprecated JobClient methods, then it will cause some trouble. I propose we
stick with the naming schema in this patch, and later in a bigger issue change
all the naming (internal and external) to respect {jobid, taskid,
taskattemptid} (as a side note refactoring TaskID to TaskAttemptID and
TaskInProgressID to TaskID will be mind-blowing) . This new issue should be a
blocker for 0.18 so that users not using trunk will not see this (current)
patch. Thoughts ?
> Replace the job, tip and task ids with objects.
> -----------------------------------------------
>
> Key: HADOOP-544
> URL: https://issues.apache.org/jira/browse/HADOOP-544
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.18.0
> Reporter: Owen O'Malley
> Assignee: Enis Soztutar
> Fix For: 0.18.0
>
> Attachments: api-changes.tgz, id_v1.patch, id_v2.patch, id_v3.patch,
> id_v4.patch, id_v5.patch, id_wip1.patch
>
>
> I think that it is silly to have tools parsing the strings that the framework
> builds for task ids. I propose:
> class JobId implements Writable {
> public int getJobId() {...}
> }
> class TaskId implements Writable {
> public JobId getJobId();
> public boolean isMap() { ... }
> public int getTaskId() { ... }
> }
> class TaskAttemptId implements Writable {
> public TaskId getTaskId();
> public int getAttemptId();
> }
> each of the classes will have a toString() method that generates the current
> string.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.