> On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote:
> > I'm wondering what's the story for Hive CLI. Hive CLI can add jars from 
> > local file system. Would this work for Hive on Spark?

Hive CLI add jars to classpath dynamically same as this patch does for 
RemoteDriver, update thread context classloader with added jars path included. 
For Hive on Spark, Hive CLI stay the same, the issue is that RemoteDriver does 
not add these added jars into its class path, so the NoClassFound error come 
out while RemoteDriver side need related class.


> On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 367
> > <https://reviews.apache.org/r/30107/diff/4/?file=829688#file829688line367>
> >
> >     Callers of getBaseWork() will add the jars to the classpath. Why this 
> > is necessary? Who are the callers? Any side-effect?

The reason why we need to do this is that, getBaseWork() would generate 
MapWork/ReduceWork which contains Hive operators inside, and UDTFOperator which 
contains added jar class need to be loaded. To load added jar dynamically, we 
need to reset thread context classloader, as mentioned in previous change 
summary, unlike HiveCLI, there are 2 threads in RemoteDriver side may need to 
load added jar, For akka thread, there is no proper cut-in point for add jars 
to classpath.
The side-effect is that, many HiveCLI threads may have to check to update its 
classload unneccsary.
Another possible solution is that, we update SystemClassLoader for RemoteDriver 
dynamically, which must be done in a quite hacky way, such as:

        URLClassLoader sysloader = (URLClassLoader) 
ClassLoader.getSystemClassLoader();
        Class sysclass = URLClassLoader.class;

        try {
            Method method = sysclass.getDeclaredMethod("addURL", parameters);
            method.setAccessible(true);
            method.invoke(sysloader, new Object[] {u});
        } catch (Throwable t) {
            t.printStackTrace();
            throw new IOException("Error, could not add URL to system 
classloader");
        }

Which one do you prefer?


> On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java,
> >  line 220
> > <https://reviews.apache.org/r/30107/diff/4/?file=829689#file829689line220>
> >
> >     So, this is the code that adds the jars to the classpath of the remote 
> > driver?
> >     
> >     I'm wondering why these jars are necessary in order to deserailize 
> > SparkWork.

Same as previous comments, SparkWork contains MapWork/ReduceWork which contains 
operator tree, UTFFOperator need to load added jar class.


- chengxiang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69329
-----------------------------------------------------------


On 一月 22, 2015, 9:23 a.m., chengxiang li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30107/
> -----------------------------------------------------------
> 
> (Updated 一月 22, 2015, 9:23 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9410
>     https://issues.apache.org/jira/browse/HIVE-9410
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The RemoteDriver does not contains added jar in it's classpath, so it would 
> failed to desrialize SparkWork due to NoClassFoundException. For Hive on MR, 
> while use add jar through Hive CLI, Hive add jar into CLI classpath(through 
> thread context classloader) and add it to distributed cache as well. Compare 
> to Hive on MR, Hive on Spark has an extra RemoteDriver componnet, we should 
> add added jar into it's classpath as well.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
> 30a00a7 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 
> 00aa4ec 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 
> 1eb3ff2 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
> 5f9be65 
>   
> spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30107/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> chengxiang li
> 
>

Reply via email to