[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713706#comment-13713706
 ] 

Robert Joseph Evans commented on MAPREDUCE-4493:
------------------------------------------------

[~ojoshi] By opening the symbolic link in the current working directory.

Prior to YARN the default behavior was to not create symlinks in the current 
working directory pointing to the items in the distributed cache.  If you 
wanted links you had to specifically turn that option on and provide the name 
of the symlink you wanted.  The only way to get to files without symlinks was 
to call getLocalCacheFiles and getCacheFiles.  In YARN all files will have a 
symlink created.  The name of the file/directory will be the name of the 
symlink.  However, it is possible to have a name collision where I wanted 
hdfs://foo/bar.zip and hdfs://bar/bar.zip.  In 1.0 both of these would have 
been downloaded and accessible through the deprecated APIs, but in YARN a 
warning will be output and only one of them will be downloaded.  Also because 
of the way these APIs were written the mapper code may not know that only one 
of them was downloaded and will not be able to find the missing one and blow 
up.  That is why I deprecated them in favor of nudging people to always use the 
symlinks so the behavior is always consistent.
                
> Distibuted Cache Compatability Issues
> -------------------------------------
>
>                 Key: MAPREDUCE-4493
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4493
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>             Fix For: 0.23.3, 2.0.2-alpha
>
>         Attachments: MR-4493.txt, MR-4493.txt, MR-4493.txt
>
>
> The distributed cache does not work like it does in 1.0.
> mapreduce.job.cache.symlink.create is completely ignored and symlinks are 
> always created no matter what.  Files and archives without a fragment will 
> also have symlinks created.
> If two cache archives or cache files happen to have the same name, or same 
> symlink fragment only the last one in the list is localized.
> The localCacheArchives and LocalCacheFiles are not set correctly when these 
> duplicates happen causing off by one or more errors for anyone trying to use 
> them.
> The reality is that use of symlinking is so common currently that these 
> incompatibilities are not that likely to show up, but we still need to fix 
> them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to