[
https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705850#comment-14705850
]
Vinod Kumar Vavilapalli commented on MAPREDUCE-6454:
----------------------------------------------------
I debugged this offline with [~djp].
Some offline notes
- The original problem was tracked at MAPREDUCE-5490 for Hadoop 1, which
didn't go in. Essentially, things like hive running as part of Oozie actions do
not get the distributed-cache files in their classpath.
- In Hadoop 1, the MapReduce child wouldn't set CLASSPATH and HADOOP_CLASSPATH
to distributed-cache files.
- In Hadoop 2, the situation got better, CLASSPATH is set correctly to have
distributed-cache files. But HADOOP_CLASSPATH doesn't. Unfortunately, hadoop
scripts don't respect CLASSPATH that is set, so we have to explicitly set
HADOOP_CLASSPATH to also point to distributed cache files.
The solution for this is to have MapReduce set distributed-cache files in
HADOOP_CLASSPATH in addition to setting in CLASSPATH.
Junping Du, the patch looks good.
- We are not setting this for the AM, but that sounds okay.
- We are making sure that any HADOOP_CLASSPATH already set is inherited
instead of replaced - good!
- We are only put the distributed-cache entries, and skipping MR framework etc
- good again!
The patch looks good to me. Been reviewing this offline. Will check this in if
Jenkins says okay.
> MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
> ----------------------------------------------------------------------------
>
> Key: MAPREDUCE-6454
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Junping Du
> Assignee: Junping Du
> Priority: Critical
> Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch,
> MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch
>
>
> We already set lib jars on distributed-cache to CLASSPATH. However, in some
> corner cases (like: MR local mode, Hive Map side local join, etc.), we need
> these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching
> runjar process.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)