[
https://issues.apache.org/jira/browse/OOZIE-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606696#comment-15606696
]
Peter Bacsko commented on OOZIE-2714:
-------------------------------------
Hi Robert
1. Yes, it definitely introduces some overhead because we have to open every
jar and check them. Having this feature off by default is probably a good
thing, but we might encourage users to try this out at least once to make sure
their classpath is clean.
2. I can imagine that two different classes are compatible on API level and
only the class versions are different. Another scenario is that their public
API is almost the same but one method differs from another. I would simply say
that if two classes are different in their binary representation it should mean
that they're candidates for potential conflict and we signal an error, even if
a particular method call could complete without errors.
3. Custom classloader - I think it depends on how the custom loader works. If
it loads a distinct set jars in an isolated fashion (which is normally the
case) and only ask ours (the parent) to load the rest then it's probably fine -
but yeah, we have to think about this.
In general, as long as it can be switched off completely, I think we are good.
There is a tricky thing that I found out recently: a lot of libs are added from
multiple folders from different hadoop projects (like hadoop-yarn, hadoop-hdfs,
hadoop-whatever). But those are jars are the same (hopefully :)). This probably
necessitates the compare-based approach (unless we exclude certain
paths/path-pairs from being checked).
> Detect conflicting resources during class loading
> -------------------------------------------------
>
> Key: OOZIE-2714
> URL: https://issues.apache.org/jira/browse/OOZIE-2714
> Project: Oozie
> Issue Type: New Feature
> Components: core
> Reporter: Peter Bacsko
>
> There are a bunch of issues in Oozie which are related to class loading.
> The main problem is that the classpath is constructed in a way which is very
> specific to Oozie:
> - Hadoop lib jars
> - Sharelib jars
> - User-defined jars
> Sometimes there is a conflict between sharelib and hadoop lib version. Also,
> users can add their own jars which sometimes contain a different version of
> popular libraries such as Guava, Apache commons, etc.
> We should be able to detect these conflicts and print exact error message so
> that Oozie users can take appropriate actions to resolve the problem.
> A possible approach is the following:
> * start the execution of an action on a different thread
> * replace the thread's context classloader with a classloader which can
> detect conflicts
> * when the JVM invokes the {{loadClass()}} method of the classloader, it
> scans through the jars (which are available as {{URLClassPath}} objects). If
> it finds the given resource in at least two jars, it can do different things
> depending on the setup:
> ** throws an error immediately, mentioning the conflicting jars (this is
> probably too strict - but still an option)
> ** loads the two resource into a byte array and compares them - it only
> throws an error if there is difference
> ** compares the jars but only emits an error message if there is a conflict
> ** something else (user defined action?)
> Implementing such a classloader is not difficult and would greatly enhance
> the supportability of Oozie. It could work in multiple modes depending on the
> setup - perhaps being able to control it from a workflow config is desirable.
> If there's any problem, we should be able to turn it off completely, too.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)