[ 
https://issues.apache.org/jira/browse/OOZIE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530304#comment-16530304
 ] 

Daniel Dai commented on OOZIE-3300:
-----------------------------------

Excluding mapreduce.job.cache.files and mapreduce.job.cache.archives from 
pig.properties would be the ideal solution. Even without the error, shipping 
all sharelib jars is not necessarily and hurt performance. Pig would pick jars 
which needs to ship to backend. I am not sure how disruptive it is for Oozie as 
some Oozie user might rely on this and not register jar in Pig. cc [~rohini]

> Hadoop 3 Pig (with Tez) action shall not use given distributed cache
> --------------------------------------------------------------------
>
>                 Key: OOZIE-3300
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3300
>             Project: Oozie
>          Issue Type: Bug
>          Components: action
>    Affects Versions: 4.3.1
>         Environment: Oozie-4.3.1
> Hadoop-3
>            Reporter: Denes Bodo
>            Assignee: Denes Bodo
>            Priority: Critical
>              Labels: usability
>         Attachments: OOZIE-3300_001.patch
>
>
> When I run my Pig action and set -x Tez to use Tez as the execution engine, I 
> got the following error:
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.mapred.InvalidJobConfException: cache file 
> (mapreduce.job.cache.files) scheme: "hdfs" host: "mycluster" port: -1 file: 
> "/tmp/temp257132233/hive-exec-3.0.0.3.0.0.0-1469.jar" conflicts with cache 
> file (mapreduce.job.cache.files)
>  
> hdfs://mycluster/user/oozie/share/lib/lib_20180611041054/pig/hive-exec-3.0.0.3.0.0.0-1469.jar
> {code}
> or
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.mapred.InvalidJobConfException: cache file 
> (mapreduce.job.cache.files) scheme: "hdfs" host: "ns1" port: -1 file: 
> "/tmp/temp36150863/joda-time-2.9.6.jar" conflicts with cache file 
> (mapreduce.job.cache.files) 
> hdfs://ns1/user/oozie/share/lib/lib_20180612003013/pig/joda-time-2.9.6.jar
> {code}
> When I exclude the conflicting jar, then Pig cannot start due to class not 
> found exception.
> I think a workaround or solution? could be if we exclude the 
> *mapreduce.job.cache.files* and *mapreduce.job.cache.archives* from 
> pig.properties file.
>  
> It is a bit scary change because I am not sure if the user's job shall use 
> specific driver or implementation which needs to be on distributed cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to