[ 
https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767161#action_12767161
 ] 

Chris Douglas commented on MAPREDUCE-972:
-----------------------------------------

I see. Extending the FileSystem API is a non-starter, so we can move on from 
that. Progress threads in general are discouraged (e.g. HADOOP-5052).

If I understand your proposal, the progress thread would report starting from 
the first rename, but stop after some configurable interval. In most cases, I'm 
not sure how this would differ from simply setting the task timeout higher, 
since progress is reported between renames. Also, this wouldn't help renames 
after the thread exits.

Would it be sufficient to add a call to progress() right before the rename 
(after the delete)? In that case, setting the task timeout higher would extend 
the time allowed for each rename, which is the right level of granularity, 
anyway. It won't do this automatically for s3 destinations, but pushing that 
detail into distcp is not ideal, either. One could add a FilterFileSystem that 
resets a persistent progress thread for each rename, manage all the 
signaling/locking etc., but its behavior seems indistinguishable from this much 
simpler tweak. Would this be sufficient?

> distcp can timeout during rename operation to s3
> ------------------------------------------------
>
>                 Key: MAPREDUCE-972
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 0.20.1
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-972.2.patch, MAPREDUCE-972.3.patch, 
> MAPREDUCE-972.4.patch, MAPREDUCE-972.5.patch, MAPREDUCE-972.patch
>
>
> rename() in S3 is implemented as copy + delete. The S3 copy operation can 
> perform very slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to