[
https://issues.apache.org/jira/browse/MAPREDUCE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713706#comment-13713706
]
Robert Joseph Evans commented on MAPREDUCE-4493:
------------------------------------------------
[~ojoshi] By opening the symbolic link in the current working directory.
Prior to YARN the default behavior was to not create symlinks in the current
working directory pointing to the items in the distributed cache. If you
wanted links you had to specifically turn that option on and provide the name
of the symlink you wanted. The only way to get to files without symlinks was
to call getLocalCacheFiles and getCacheFiles. In YARN all files will have a
symlink created. The name of the file/directory will be the name of the
symlink. However, it is possible to have a name collision where I wanted
hdfs://foo/bar.zip and hdfs://bar/bar.zip. In 1.0 both of these would have
been downloaded and accessible through the deprecated APIs, but in YARN a
warning will be output and only one of them will be downloaded. Also because
of the way these APIs were written the mapper code may not know that only one
of them was downloaded and will not be able to find the missing one and blow
up. That is why I deprecated them in favor of nudging people to always use the
symlinks so the behavior is always consistent.
> Distibuted Cache Compatability Issues
> -------------------------------------
>
> Key: MAPREDUCE-4493
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4493
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0
> Reporter: Robert Joseph Evans
> Assignee: Robert Joseph Evans
> Priority: Critical
> Fix For: 0.23.3, 2.0.2-alpha
>
> Attachments: MR-4493.txt, MR-4493.txt, MR-4493.txt
>
>
> The distributed cache does not work like it does in 1.0.
> mapreduce.job.cache.symlink.create is completely ignored and symlinks are
> always created no matter what. Files and archives without a fragment will
> also have symlinks created.
> If two cache archives or cache files happen to have the same name, or same
> symlink fragment only the last one in the list is localized.
> The localCacheArchives and LocalCacheFiles are not set correctly when these
> duplicates happen causing off by one or more errors for anyone trying to use
> them.
> The reality is that use of symlinking is so common currently that these
> incompatibilities are not that likely to show up, but we still need to fix
> them.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira