[ https://issues.apache.org/jira/browse/MAPREDUCE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473271#comment-13473271 ]
Robert Joseph Evans commented on MAPREDUCE-4568: ------------------------------------------------ Adding a true duplicate, exact same file multiple times, to the dist cache will not result in an error under YARN. The MR client will just dedupe them before submitting the request to YARN. The issue is when there are different files that will both map to the same key in the dist cache map (the key is the name of the symlink created in the working directory of the task/container). Then is where it will throw an exception under 2.0 > Throw "early" exception when duplicate files or archives are found in > distributed cache > --------------------------------------------------------------------------------------- > > Key: MAPREDUCE-4568 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4568 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Mohammad Kamrul Islam > Assignee: Arun C Murthy > > According to #MAPREDUCE-4549, Hadoop 2.x throws exception if duplicates found > in cacheFiles or cacheArchives. The exception throws during job submission. > This JIRA is to throw the exception ==early== when it is first added to the > Distributed Cache through addCacheFile or addFileToClassPath. > It will help the client to decide whether to fail-fast or continue w/o the > duplicated entries. > Alternatively, Hadoop could provide a knob where user will choose whether to > throw error( coming behavior) or silently ignore (old behavior). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira