[
https://issues.apache.org/jira/browse/HADOOP-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Omkar Aradhya K S updated HADOOP-14407:
---------------------------------------
Description: Currently, the RetriableFileCopyCommand has a fixed copy
buffer size of just 8KB. We have noticed in our performance tests that with
bigger buffer sizes we saw upto ~3x performance boost. Hence, making the copy
buffer size a configurable setting via the new parameter <copybuffersize>.
(was: The minimum unit of work for a distcp task is a file. We have files that
are greater than 1 TB with a block size of 1 GB. If we use distcp to copy
these files, the tasks either take a long long long time or finally fails. A
better way for distcp would be to copy all the source blocks in parallel, and
then stich the blocks back to files at the destination via the HDFS Concat API
(HDFS-222))
> DistCp - Introduce a configurable copy buffer size
> --------------------------------------------------
>
> Key: HADOOP-14407
> URL: https://issues.apache.org/jira/browse/HADOOP-14407
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Affects Versions: 2.9.0
> Reporter: Omkar Aradhya K S
> Assignee: Yongjun Zhang
> Fix For: 2.9.0, 3.0.0-alpha3
>
>
> Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just
> 8KB. We have noticed in our performance tests that with bigger buffer sizes
> we saw upto ~3x performance boost. Hence, making the copy buffer size a
> configurable setting via the new parameter <copybuffersize>.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]