[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705850#comment-14705850
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6454:
----------------------------------------------------

I debugged this offline with [~djp].

Some offline notes
 - The original problem was tracked at MAPREDUCE-5490 for Hadoop 1, which 
didn't go in. Essentially, things like hive running as part of Oozie actions do 
not get the distributed-cache files in their classpath.
 - In Hadoop 1, the MapReduce child wouldn't set CLASSPATH and HADOOP_CLASSPATH 
to distributed-cache files.
 - In Hadoop 2, the situation got better, CLASSPATH is set correctly to have 
distributed-cache files. But HADOOP_CLASSPATH doesn't. Unfortunately, hadoop 
scripts don't respect CLASSPATH that is set, so we have to explicitly set 
HADOOP_CLASSPATH to also point to distributed cache files.

The solution for this is to have MapReduce set distributed-cache files in 
HADOOP_CLASSPATH in addition to setting in CLASSPATH.

Junping Du, the patch looks good.
 - We are not setting this for the AM, but that sounds okay.
 - We are making sure that any HADOOP_CLASSPATH already set is inherited 
instead of replaced - good!
 - We are only put the distributed-cache entries, and skipping MR framework etc 
- good again!

The patch looks good to me. Been reviewing this offline. Will check this in if 
Jenkins says okay.

> MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, 
> MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch
>
>
> We already set lib jars on distributed-cache to CLASSPATH. However, in some 
> corner cases (like: MR local mode, Hive Map side local join, etc.), we need 
> these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching 
> runjar process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to