[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473271#comment-13473271
 ] 

Robert Joseph Evans commented on MAPREDUCE-4568:
------------------------------------------------

Adding a true duplicate, exact same file multiple times, to the dist cache will 
not result in an error under YARN.  The MR client will just dedupe them before 
submitting the request to YARN.  The issue is when there are different files 
that will both map to the same key in the dist cache map (the key is the name 
of the symlink created in the working directory of the task/container).  Then 
is where it will throw an exception under 2.0
                
> Throw "early" exception when duplicate files or archives are found in 
> distributed cache
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4568
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4568
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Mohammad Kamrul Islam
>            Assignee: Arun C Murthy
>
> According to #MAPREDUCE-4549, Hadoop 2.x throws exception if duplicates found 
> in cacheFiles or cacheArchives. The exception  throws during job submission.
> This JIRA is to throw the exception ==early== when it is first added to the 
> Distributed Cache through addCacheFile or addFileToClassPath.
> It will help the client to decide whether to fail-fast or continue w/o the 
> duplicated entries.
> Alternatively, Hadoop could provide a knob where user will choose whether to 
> throw error( coming behavior) or silently ignore (old behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to