[ https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618539#action_12618539 ]
Tsz Wo (Nicholas), SZE commented on HADOOP-3873: ------------------------------------------------ > In most cases, the total size to be copied can be determined up front, before > the copying begins, no? Yes, you are right that we can pre-compute lists of files being copied and impose whatever constraints. The new option is to automate the pre-computation. DistCp currently computes a list of files before copying. I am planning to change the computation so that the list will satisfy the file/size limit constraints. > What might be better is a mechanism to stop a DistCp job. E.g., one could > provide a "stop" file name. When this is non-null, copying will stop as soon > as the named file exists. Might that meet the need here? This is a good idea to stop DistCp job nicely. Let me see whether it could solve the backup use case described above. > DistCp should have an option for limiting the number of files/bytes being > copied > -------------------------------------------------------------------------------- > > Key: HADOOP-3873 > URL: https://issues.apache.org/jira/browse/HADOOP-3873 > Project: Hadoop Core > Issue Type: New Feature > Components: tools/distcp > Reporter: Tsz Wo (Nicholas), SZE > > A single DistCp command may potentially copies a huge number of files/bytes. > In such case, DistCp will run a long time and there is no way stop it nicely. > It would be good if DistCp have an option to limit the number of files/bytes > being copied. Once the limit is reached, DistCp will terminate and return > success. All files copied are guaranteed to be good and there is no > partially copied file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.