[ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Nauroth updated HADOOP-13169: ----------------------------------- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) +1 for patch 010. I verified that {{TestAzureNativeContractDistCp}} and {{ITestS3AContractDistCp}} are passing. Rajesh, thank you for the patch. Steve, thank you for the code review. > Randomize file list in SimpleCopyListing > ---------------------------------------- > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Priority: Minor > Fix For: 2.8.0 > > Attachments: HADOOP-13169-branch-2-001.patch, > HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, > HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, > HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, > HADOOP-13169-branch-2-008.patch, HADOOP-13169-branch-2-009.patch, > HADOOP-13169-branch-2-010.patch > > > When copying files to S3, based on file listing some mappers can get into S3 > partition hotspots. This would be more visible, when data is copied from hive > warehouse with lots of partitions (e.g date partitions). In such cases, some > of the tasks would tend to be a lot more slower than others. It would be good > to randomize the file paths which are written out in SimpleCopyListing to > avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org