[
https://issues.apache.org/jira/browse/LIVY-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajat Khandelwal updated LIVY-880:
----------------------------------
Summary: Loading third-party jar in the interpreter gives no errors when
there is a conflicting class (was: Loading third-party jar in the interpreter
is buggy)
> Loading third-party jar in the interpreter gives no errors when there is a
> conflicting class
> --------------------------------------------------------------------------------------------
>
> Key: LIVY-880
> URL: https://issues.apache.org/jira/browse/LIVY-880
> Project: Livy
> Issue Type: Bug
> Reporter: Rajat Khandelwal
> Priority: Major
>
> Livy's mechanism of loading third party jar into the interpreter is
> incorrect, especially when there is a conflicting class in the third party
> jar.
>
> By third party jars, I mean the jars you supply while creating session
>
> {"name":"session-name", "kind":"spark", "jars":["hdfs://path/to/jar/1.jar"]}
>
> Now when we have a conflict (the scenario where the jar is a fat jar and it
> is bundling e.g. some older hadoop libs, or older jackson libs in it), we run
> into a problem and the problem menifests in a weird way.
>
> Let's say your jar has a class named `a.b.c.SomeClass`.
>
> This is what I have observed
> * create session goes through
> * you're not able to import things from your jar. Like running code like
> `import com.path.SomeClass` fails saying error: object path is not a member
> of package com
> * But you are able to load your classes from the jar by running code like
> `Thread.currentThread.getContextClassLoader.loadClass("a.b.c.SomeClass")`
> Essentially the classloaders are messed up. You can load a class by
> reflection, but REPL has no idea of this class being in the classpath.
> I have seen more reports of such a problem on jira, google-group,
> stack-overflow etc. Mentioning a few:
> *
> https://stackoverflow.com/questions/65654752/getting-import-error-while-executing-statements-via-livy-sessions-with-emr
> *
> https://community.cloudera.com/t5/Support-Questions/How-to-import-External-Libraries-for-Livy-Interpreter-using/td-p/171812
> *
> https://community.cloudera.com/t5/Support-Questions/Livy-Spark-Rest-Jar-submission-interactive-session/td-p/302924
> * https://groups.google.com/a/cloudera.org/g/hue-user/c/wR6d7gR_Avs
> *
> https://community.cloudera.com/t5/Community-Articles/Added-external-package-to-livy-causes-quot-console-25-quot/ta-p/245802
> * https://issues.apache.org/jira/browse/LIVY-857
> There is no definitive answer there. People have suggested these things
> 1. Adding jar to livy installation. Inside repl-jars
> 2. Adding jar to livy rsc-jars
> 3. Adding jar to hadoop installation on all nodes and using spark.
> 4. Use packages(group:artifact:version), not jars
> We tried all these, the first two didn't work for us, the third did. But the
> third mechanism is not ideal, because you're treating a third party jar as a
> library jar (equivalent to hadoop/spark jars) and that is not always feasible
> on prod systems.
> The fourth mechanism is not always feasible, as livy only lets you specify
> packages and not their repository locations.
> Now, digging deeper, we figured out the cause and a potential solution.
> Livy uses scala interpreter under the hood. Relevant classes [ILoop]
> (https://github.com/scala/scala/blob/a05d71a1ea33b265015794f71d12020d3f7ddd1f/src/repl/scala/tools/nsc/interpreter/ILoop.scala#L646-L701)
> and
> [IMain](https://github.com/scala/scala/blob/a05d71a1ea33b265015794f71d12020d3f7ddd1f/src/repl/scala/tools/nsc/interpreter/IMain.scala#L251]
> If you look at the first link, we see there are two methods in `ILoop`, both
> of which are wrappers around `intp.addUrlsToClassPath`. The first wrapper
> `addClasspath` is deprecated and the second wrapper `require` is recommended.
> The `require` method does extra checks on the jar before actually calling
> `intp.addUrlsToClassPath`. The checks are just for class-conflict. If there
> is any class in the required-jars that conflicts with already loaded classes,
> it won't be loaded. Scala’s REPLs class path is a bit fragile and does not
> allow repeating classes defined within multiple jars. So they go around this
> issue by exposing the `require` interface in the command line. By using
> `require`, the user gets to know what is the conflict and they can take
> corrective action. If we bypass `require` (which is what is happening in
> Livy's REPL code), we get into this state where you can load classes through
> reflection but you can't import them.
> Now, to anyone looking for a workaround to this, just clean up your jar and
> make sure you have as little conflicts with hadoop/scala/spark libraries. If
> your lib depends on these, make them `provided` and don't bundle them in your
> jar.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)