[
https://issues.apache.org/jira/browse/OOZIE-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426513#comment-15426513
]
Georgi Ivanov commented on OOZIE-2643:
--------------------------------------
Repro:
# create test table
create table test_json_serde (a string) ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe';
# sample query
$ cat query.hql
select count(*) from default.test_json_serde;
# sample workflow
<workflow-app name="tez-json-serde-test" xmlns="uri:oozie:workflow:0.4">
<start to="hive-tez"/>
<action name="hive-tez">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/user/oozie/share/conf/hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
<property>
<name>oozie.action.sharelib.for.hive</name>
<value>hive,hcatalog</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
</configuration>
<script>/user/givanov/json-serde/query.hql</script>
<file>/user/oozie/share/conf/tez-site.xml</file>
</hive>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
hive-site.xml and tez-site.xml are the standard global ones.
# a sample job.properties
nameNode=<namenode_address>
jobTracker=<jobtracker_address>
oozie.wf.application.path=<namenode_address>/user/givanov/json-serde
oozie.use.system.libpath=true
queueName=default
when we run it:
oozie job -config job.properties -run
Hive action starts successfully but tez session fails with the abovementioned
classnotfound error. Using MR works fine.
It looks like oozie does not localize the jars for the tez session.
> Hive on Tez vai Oozie fails with ClassNotFound Exception against tables with
> Hcatalog Json Serde
> ------------------------------------------------------------------------------------------------
>
> Key: OOZIE-2643
> URL: https://issues.apache.org/jira/browse/OOZIE-2643
> Project: Oozie
> Issue Type: Bug
> Reporter: Georgi Ivanov
>
> If we create an oozie workflow that has a hive action using tez execution
> engine and we reference a table with hcatalog json serde, oozie does not
> localize properly the sharelib jars for the tez session. It localizes them
> for the Hive Action but they do not propagate to Tez. Using Hive on MR works
> fine. This problem is only present with Oozie and Tez. The workflow throws
> ClassNotFound Exception.
> 2016-08-18 08:35:00,337 [ERROR] [TezChild] |tez.TezProcessor|:
> java.lang.RuntimeException: Map operator initialization failed
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:265)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassNotFoundException: Class
> org.apache.hive.hcatalog.data.JsonSerDe not found
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:347)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:382)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:227)
> ... 15 more
> Caused by: java.lang.ClassNotFoundException: Class
> org.apache.hive.hcatalog.data.JsonSerDe not found
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
> at
> org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:143)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:313)
> ... 17 more
> The only workaround that I've found so far is to add the hcatalog sharelib to
> tez.aux.uris as a configuration parameter inside the workflow.xml
> <property>
> <name>tez.aux.uris</name>
> <value>${nameNode}/user/oozie/share/lib/hcatalog/</value>
> </property>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)