[ 
https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876858#comment-14876858
 ] 

Ratandeep Ratti commented on HIVE-11878:
----------------------------------------

bq.  if we had previously loaded a class with the previous classloader, and now 
load the class again with the current classloader, would there be any potential 
effects here? 

The two class objects will definitely be different. I'll try to look if we 
compare class-objects in the code. Some effects that come to mind are 
1. o instanceof c . If c is loaded by a classloader u1 and o is also an object 
of c, but the object's class was loaded by another classloader u2.
2. casting may not work. (similar reasoning as above)

[~jdere], [~ashutoshc] . I'd also like to get your opinion on approach 3, 
mentioned above, which is we do not create new classloaders for every jar, but 
add jars to the same classloader using the {{addURL}} method. We basically 
extend the URLClassLoader and change scope of the method addURL from protected 
to public. This can side step the potential problems that we are discussing 
here.  As for deleting jars in 
{{org.apache.hadoop.hive.ql.exec.Utilities#removeFromClassPath}}, it can be 
exactly as before, except that it will not create an instance of URLClassloader 
but a subclass of it (with scope of addURL changed) and set that as the 
currentThreadContext classloader  and the Hadoop Configuration classloader.

One way to think about approach 3 is that it is exactly like what is currently 
being done, except that we register all the jars at once.  I haven't 
implemented approach 3 yet, wanted to get some opinion on it before I proceeded 
further.

> ClassNotFoundException can possibly  occur if multiple jars are registered 
> one at a time in Hive
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11878
>                 URL: https://issues.apache.org/jira/browse/HIVE-11878
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>            Reporter: Ratandeep Ratti
>            Assignee: Ratandeep Ratti
>              Labels: URLClassLoader
>         Attachments: HIVE-11878.patch, HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple 
> URLClassLoaders created, each classloader including the jars from its parent 
> and the one extra jar to be registered. The last URLClassLoader created will 
> end up as the current ThreadContextClassLoader. (See details: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
> *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
> the URLClassLoader *u1* is created and also set as the 
> ThreadContextClassLoader. We register *j2* next, the new URLClassLoader 
> created will be *u2* with *u1* as parent and *u2* becomes the new 
> ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
> whereas *u1* only has paths to *j1* (For details see: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load 
> the class using {code} class.forName("c1", true, 
> Thread.currentThread().getContextClassLoader()) {code} . The 
> currentThreadContext class-loader is *u2*, and it has the path to the class 
> *c1*, but note that Class-loaders work by delegating to parent class-loader 
> first. In this case class *c1* will be found and *defined* by class-loader 
> *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say 
> initialize) is called in *c1*, which references the class *c2*, *c2* will not 
> be found since the class-loader used to search for *c2* will be *u1* (Since 
> the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to