Abhay Yadav created HADOOP-18739:
------------------------------------
Summary: Parallelize concatenation of distcp chunks of separate
files in CopyCommitter
Key: HADOOP-18739
URL: https://issues.apache.org/jira/browse/HADOOP-18739
Project: Hadoop Common
Issue Type: Improvement
Components: tools/distcp
Reporter: Abhay Yadav
While copying a folder containing large files consisting of multiple distcp
chunks, copy committer synchronously picks chunks of each file and concatenates
them. This part can be improved by parallelizing the concatenation of distcp
chunks of separate files. We are able to save 2-3 minutes while copying a
folder of 100 GB containing 20 files of 5GB size with this improvement.
Contributing a patch for this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]