[ 
https://issues.apache.org/jira/browse/HADOOP-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591950#action_12591950
 ] 

Enis Soztutar commented on HADOOP-544:
--------------------------------------

Owen, thanks for the jdiff, I haven't though of that. 
I checked the differences in the public APIs : 
* methods in CompletedJobStatusStore is changed w/o deprecation, but I think 
this class should be package private rather than public. Its methods will be 
accessed though JobClient leveraging facade design pattern, no? 
* API changes in JobClient, RunningJob, TaskCompletionEvent, TaskReport 
deprecates old ones, no problem here. 
* API changes in JobHistory : I am not sure if JobHistory is really public. 
There are HistoryViewer, jsp pages etc to view/parse job history so I do not 
know if it is actually used outside hadoop. If you think it is, then we can fix 
the patch to keep the old methods as well. 
* Although public, JobProfile, JobStatus, jobSubmissionProtocol(see 
HADOOP-1643), MapTaskStatus, and ReduceTaskStatus are never directly exposed to 
the user so the changes in them are acceptable I think. 
* methods in JT and TT, not sure if we need to change them, if so we can fix 
this. 

The name sets {jobid, taskinprogressid, taskid} and {jobid, taskid, 
taskattemptid} is already confusing (sometimes even I get confused. ) I think 
the ultimate naming should be the latter, which makes more sense. However we 
have been using and exposing both naming schema. For example the task names 
start with tip_ or task_, we have JobClient#killTask(taskid) which accepts a 
task attempt id, and such, on the other hand JobHistory uses  
taskID-taskAttemptID, so the current situation is already inconsistent. 

I suspect if we introduce JobID, TaskID, TaskAttemptID as a replacement of the 
deprecated JobClient methods, then it will cause some trouble. I propose we 
stick with the naming schema in this patch, and later in a bigger issue change 
all the naming (internal and external) to respect  {jobid, taskid, 
taskattemptid} (as a side note refactoring TaskID to TaskAttemptID and 
TaskInProgressID to TaskID will be mind-blowing) . This new issue should be a 
blocker for 0.18 so that users not using trunk will not see this (current) 
patch. Thoughts ? 
 

> Replace the job, tip and task ids with objects.
> -----------------------------------------------
>
>                 Key: HADOOP-544
>                 URL: https://issues.apache.org/jira/browse/HADOOP-544
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Owen O'Malley
>            Assignee: Enis Soztutar
>             Fix For: 0.18.0
>
>         Attachments: api-changes.tgz, id_v1.patch, id_v2.patch, id_v3.patch, 
> id_v4.patch, id_v5.patch, id_wip1.patch
>
>
> I think that it is silly to have tools parsing the strings that the framework 
> builds for task ids. I propose:
> class JobId implements Writable {
>    public int getJobId() {...}
> }
> class TaskId implements Writable {
>   public JobId getJobId(); 
>   public boolean isMap() { ... }
>   public int getTaskId() { ... }
> }
> class TaskAttemptId implements Writable {
>   public TaskId getTaskId();
>   public int getAttemptId();
> }
> each of the classes will have a toString() method that generates the current 
> string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to