Andrew Wang updated HADOOP-13169:
    Fix Version/s: 3.0.0-alpha2

> Randomize file list in SimpleCopyListing
> ----------------------------------------
>                 Key: HADOOP-13169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13169
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>             Fix For: 2.8.0, 3.0.0-alpha2
>         Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, 
> HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, 
> HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, 
> HADOOP-13169-branch-2-008.patch, HADOOP-13169-branch-2-009.patch, 
> HADOOP-13169-branch-2-010.patch
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to