[ 
https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632519#action_12632519
 ] 

Owen O'Malley commented on HADOOP-4209:
---------------------------------------

After looking through the code some more and seeing the attempt ids like:

attempt_200707121733_0003_m_000005_0_1234567890123

There are problems:
  1. The format of the task ids change depending on the context.
  2. The final number is way longer than it needs to be.
  3. The numbers are out of order for sorting.
  4. The change of the format of the task ids needs to be called out much more 
explicitly.

I think it would be much better to expand the retry out to 4 digits and 
increment by one each time the job is running through a restart:

attempt_200707121733_0003_m_000005_0000
attempt_200707121733_0003_m_000005_0001   // fails once
attempt_200707121733_0003_m_000005_1000   // after a restart
attempt_200707121733_0003_m_000005_1001   // fails after restart
attempt_200707121733_0003_m_000005_2000   // after a second restart

That way, we keep the format consistent and compatible. It is a single variable 
to track in the JobInProgress and is easy to explain. The only problem would 
come if you had 1000 failures on an attempt and then had a JT reset.




> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start 
> time as milliseconds. We should instead change the JobID to have the longer 
> unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to