[
https://issues.apache.org/jira/browse/MAPREDUCE-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518200#comment-14518200
]
Preston Koprivica commented on MAPREDUCE-6344:
----------------------------------------------
I'm absolutely fine demoting this to an improvement - I'll defer to the experts
what they think the severity and type of issue this is.
However, I do think that determinism is important in this instance and should
warrant at least a minor change. In general, I agree that we should not be
packaging the same class multiple times, but I disagree that the effort of
ordering is futile. In fact, hadoop does this quite often - and it even allows
users to inject new ordering when they desire (think:
MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST).
Correct me if I'm wrong, but it appears that ordering is already preserved with
"non jar" files. This would just be asking that ordering is also preserved for
*all* files in DistributedCache and not just "non jars".
> Inconsistent classpath/classloading from DistributedCache archives
> ------------------------------------------------------------------
>
> Key: MAPREDUCE-6344
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6344
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.5.2
> Reporter: Preston Koprivica
>
> We recently upgraded to MRv2 on YARN and have been noticing very inconsistent
> classloading between the job submission client and the tasks as they start
> up.
> I've tracked the issue to this method:
> https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L264
> It appears that the classpath is simply "wild carded". According the javase
> 7&8 docs, the order of enumeration is not specified and may differ from
> moment to moment [1][2]. This is a problem for applications that rely on
> strict ordering, which the MRv1 DistributedCache used to provide.
> I'm unable to track down all the things that are linked or landed into the
> $PWD of the container, but assuming we can't account for all these things, a
> simple solution could be to explicitly enumerate the files in
> DistributedCache - similar to the "non jar" case [3] - and then add the "*"
> for passivity.
> [1] http://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.htm
> [2]
> http://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762
> [3]
> https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L270
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)