Abhay Yadav created HADOOP-18739:
------------------------------------

             Summary: Parallelize concatenation of distcp chunks of separate 
files in CopyCommitter
                 Key: HADOOP-18739
                 URL: https://issues.apache.org/jira/browse/HADOOP-18739
             Project: Hadoop Common
          Issue Type: Improvement
          Components: tools/distcp
            Reporter: Abhay Yadav


While copying a folder containing large files consisting of multiple distcp 
chunks, copy committer synchronously picks chunks of each file and concatenates 
them. This part can be improved by parallelizing the concatenation of distcp 
chunks of separate files. We are able to save 2-3 minutes while copying a 
folder of 100 GB containing 20 files of 5GB size with this improvement.

Contributing a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to