[jira] [Commented] (HADOOP-13169) Randomize file list in SimpleCopyListing

Steve Loughran (JIRA) Thu, 08 Sep 2016 02:38:33 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15473384#comment-15473384
 ]


Steve Loughran commented on HADOOP-13169:
-----------------------------------------

this is a good find. 

# Have you run the hadoop-amazon and hadoop-azure distcp tests with this?
# Concurrency. You've marked {{statusList}} as a concurrent list, implying 
there's concurrent use. But in {{writeToFileListing}} the list is shuffled,used 
and then clear()-d in an unsynced block. Is there anything to stop elements 
being added to the list between that shuffle/use and the clear() operation? as 
that would lose entries in the distcp




> Randomize file list in SimpleCopyListing
> ----------------------------------------
>
>                 Key: HADOOP-13169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13169
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>            Reporter: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch
>
>
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13169) Randomize file list in SimpleCopyListing

Reply via email to