> On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java, > > line 220 > > <https://reviews.apache.org/r/30107/diff/4/?file=829689#file829689line220> > > > > So, this is the code that adds the jars to the classpath of the remote > > driver? > > > > I'm wondering why these jars are necessary in order to deserailize > > SparkWork. > > chengxiang li wrote: > Same as previous comments, SparkWork contains MapWork/ReduceWork which > contains operator tree, UTFFOperator need to load added jar class. > > Xuefu Zhang wrote: > Sorry, but which operator? UTFFOperator? I could find it in hive source.
Sorry, as you can see from the error log in JIRA, the extra class in added jar is contained in UDTFOperator: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: de.bankmark.bigbench.queries.q10.SentimentUDF Serialization trace: genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc) conf (org.apache.hadoop.hive.ql.exec.UDTFOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) - chengxiang ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30107/#review69329 ----------------------------------------------------------- On 一月 22, 2015, 9:23 a.m., chengxiang li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30107/ > ----------------------------------------------------------- > > (Updated 一月 22, 2015, 9:23 a.m.) > > > Review request for hive and Xuefu Zhang. > > > Bugs: HIVE-9410 > https://issues.apache.org/jira/browse/HIVE-9410 > > > Repository: hive-git > > > Description > ------- > > The RemoteDriver does not contains added jar in it's classpath, so it would > failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, > while use add jar through Hive CLI, Hive add jar into CLI classpath(through > thread context classloader) and add it to distributed cache as well. Compare > to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should > add added jar into it's classpath as well. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java > 30a00a7 > spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java > 00aa4ec > spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java > 1eb3ff2 > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java > 5f9be65 > > spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/30107/diff/ > > > Testing > ------- > > > Thanks, > > chengxiang li > >