Wei-Chiu Chuang created HDFS-14574:
--------------------------------------

             Summary: [distcp] Add ability to increase the replication factor 
for fileList.seq
                 Key: HDFS-14574
                 URL: https://issues.apache.org/jira/browse/HDFS-14574
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: distcp
            Reporter: Wei-Chiu Chuang


distcp creates fileList.seq with default replication factor = 3.

For large clusters runing distcp job with thousands of mappers, that 3-replica 
for the file listing file is not good enough, because DataNodes easily run out 
of max number of xceivers.

 

It looks like we can pass in a distcp option, update replication factor in when 
creating the sequence file writer: 
[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L517-L521]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to