[ https://issues.apache.org/jira/browse/HADOOP-15961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748045#comment-16748045 ]
Steve Loughran commented on HADOOP-15961: ----------------------------------------- BTW, Looking at this patch, I think the progress call could go in the inner loop, {code} ... UploadPartResult partResult = writeOperations.uploadPart(part); offset += uploadPartSize; parts.add(partResult.getPartETag()); progress.progess() //HERE } {code} That way, it'll be invoked every 32, 64MB of part upload. If the task created 4GB of data, without the per-part uploads you could still get some timeout just from the time to upload. a progress event per block eliminates this problem > S3A committers: make sure there's regular progress() calls > ---------------------------------------------------------- > > Key: HADOOP-15961 > URL: https://issues.apache.org/jira/browse/HADOOP-15961 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Reporter: Steve Loughran > Assignee: lqjacklee > Priority: Minor > Attachments: HADOOP-15961-001.patch, HADOOP-15961-002.patch > > > MAPREDUCE-7164 highlights how inside job/task commit more context.progress() > callbacks are needed, just for HDFS. > the S3A committers should be reviewed similarly. > At a glance: > StagingCommitter.commitTaskInternal() is at risk if a task write upload > enough data to the localfs that the upload takes longer than the timeout. > it should call progress it every single file commits, or better: modify > {{uploadFileToPendingCommit}} to take a Progressable for progress callbacks > after every part upload. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org