Zheng Shao created HADOOP-13975:
-----------------------------------
Summary: Allow DistCp to use MultiThreadedMapper
Key: HADOOP-13975
URL: https://issues.apache.org/jira/browse/HADOOP-13975
Project: Hadoop Common
Issue Type: New Feature
Components: tools/distcp
Affects Versions: 3.0.0-alpha1
Reporter: Zheng Shao
Assignee: Zheng Shao
Priority: Minor
Although distcp allow users to control the parallelism via number of mappers,
sometimes it's desirable to run fewer mappers but more threads per mapper.
Since distcp is network bound (either by throughput or more frequently by
latency of creating connections, opening files, reading/writing files, and
closing files), this can make each mapper much more efficient.
In that way, a lot of resources can be shared so we can save memory and
connections to NameNode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]