[ 
https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767166#action_12767166
 ] 

Aaron Kimball commented on MAPREDUCE-972:
-----------------------------------------

My proposal is slightly different than that:

The progress thread is in one of three states:

1) {{inRename = true && isComplete == false}}
2) {{inRename = false && isComplete == false}}
3) {{isComplete = true}}

When inRename is set to true, the progress thread will call {{progress()}} 
every few seconds, for up to a max of {{distcp.rename.timeout}} seconds. If it 
is still in this state after {{distcp.rename.timeout}} seconds have elapsed 
since the state began, it will set inRename to false.

When inRename is false, it just sits there, waiting for another rename 
operation to start. It sleeps and occasionally polls for a state change on 
inRename or isComplete. Changing inRename back to true again will go into the 
previously-described state; {{distcp.rename.timeout}} starts anew from this 
time point.

If isComplete is true, the thread exits immediately. The {{Mapper.close()}} 
method will set isComplete to true to ensure that the thread shuts down. (As 
the thread is {{setDaemon(true)}}, the JVM will exit even without this detail, 
but it is good hygeine to do so anyway.)

It is not sufficient to simply call progress() right before rename(). 
Experience has shown that when uploading large files to S3, the rename() 
operation itself can take in excess of 10 minutes. rename() in S3 is 
implemented as copy-and-delete. For multi-GB files, this can take a long time.

If we just tell people to set their global task timeout to 30 minutes, then 
this will delay task restarts under other conditions where the timeout value is 
expected to be considerably shorter (e.g., an individual file {{write()}} 
operation). This can adversely affect distcp performance in the general case.

> distcp can timeout during rename operation to s3
> ------------------------------------------------
>
>                 Key: MAPREDUCE-972
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 0.20.1
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-972.2.patch, MAPREDUCE-972.3.patch, 
> MAPREDUCE-972.4.patch, MAPREDUCE-972.5.patch, MAPREDUCE-972.patch
>
>
> rename() in S3 is implemented as copy + delete. The S3 copy operation can 
> perform very slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to