[ https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767579#action_12767579 ]
Chris Douglas commented on MAPREDUCE-972: ----------------------------------------- bq. Isn't the FileSystem API mid-rewrite right now in HADOOP-6223? So now might actually be the rare time to consider something like this. It's unfortunate that Rename.Options is an Enum, so it'd be hard to add a progress function there without changing that. Perhaps Rename.Options.OVERWRITE could still be a constant, but Rename.Options#createProgress(Progressible) could return a subclass of Rename.Options that wraps a Progressible or somesuch. I don't mean to push this approach, rather just to question whether it should be ruled out completely. If it seems reasonable for file rename implementations to take a long time, then adding a progress callback might be a reasonable approach. I agree. Long-running renames should be considered/debated in the design of FileContext/AFS, particularly since we often use rename to promote output. Adding a cstr for FileContext that takes a Progressable, then adding a Progressable to all the AFS APIs would probably work. The Util inner class could also be created with it, so listStatus and copy could also update progress. Most implementations can ignore it, but that would at least push the workaround for S3 into the right layer. However, for this issue, patching either the old or new API is a non-starter. DistCp uses old APIs, and I'd much rather upgrade it and address the more general Progressable questions in other issues. Approaching all that in a single issue, particularly one devoted to a timeout in S3, imports a lot of baggage. Interesting, important baggage, but this is only a use case in that broader context. bq. Using a single timeout value for all operations makes program execution overall considerably less efficient than it should be. Writes and renames in distcp can expect different running times; we should treat them this way. Every file is copied twice. I'm not sure a long task timeout leaves too much performance on the table. Your point about tuning timeouts for particular operations is taken, but the payoff is too low for the complexity this adds. Both this and raising the task timeout for the job are hacks, but as Doug points out: we're going to have to solve this in general, too. The task timeout is a hack we have and know. bq. Looking at FilterFileSystem, I think that's the most general and non-invasive solution. [Yes|https://issues.apache.org/jira/browse/MAPREDUCE-972?focusedCommentId=12767161#action_12767161], that would be the cleanest place to add a thread, but it's still not much of a win over bumping the task timeout for the job. Updating the DistCp guide with notes for S3 users is an unambiguous win. > distcp can timeout during rename operation to s3 > ------------------------------------------------ > > Key: MAPREDUCE-972 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-972 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp > Affects Versions: 0.20.1 > Reporter: Aaron Kimball > Assignee: Aaron Kimball > Attachments: MAPREDUCE-972.2.patch, MAPREDUCE-972.3.patch, > MAPREDUCE-972.4.patch, MAPREDUCE-972.5.patch, MAPREDUCE-972.patch > > > rename() in S3 is implemented as copy + delete. The S3 copy operation can > perform very slowly, which may cause task timeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.