[jira] [Commented] (HADOOP-10893) isolated classloader on the client side

Sangjin Lee (JIRA) Wed, 13 Aug 2014 17:11:24 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096331#comment-14096331
 ]


Sangjin Lee commented on HADOOP-10893:
--------------------------------------

Thanks for the review Jason! It's helpful as always.

bq. HADOOP_USE_CLIENT_CLASSLOADER would be a better name to keep within the 
HADOOP_* shell variable namespace and is consistent with 
HADOOP_USER_CLASSPATH_FIRST. Similarly HADOOP_CLIENT_SYSTEM_CLASSES.

Sounds good. I'll make the change.

bq. ApplicationClassLoader was marked Public, so I'm wondering if we should 
leave a deprecated, trivial derivation of the new class location just in case 
someone referenced it?

I did not notice that it was marked public. I'll recreate a deprecated 
extending class in its current location.

bq. What was the rationale behind the Splitter change which seems unrelated?

If possible, I wanted to avoid having a dependency from this classloader class 
to another library unless it's really necessary. Splitter was coming from 
guava. :) In theory it should be OK even if ApplicationClassLoader used a guava 
class. It would be loaded by the system classloader anyway, and it would not 
interfere with the ApplicationClassLoader's ability to load a new version of 
the class for the user.

However, it was more of a call to minimize the external dependency from 
ApplicationClassLoader. I believe the current version (using String.split()) is 
equivalent and using the Splitter is not needed, but I'd be open to reversing 
it.

{quote}
Would be nice if we could somehow tie the default system classes defined in 
RunJar with the default for the job classloader so we don't have to remember to 
change it in two places going forward. Unfortunately the job classloader one is 
encoded in mapred-default.xml, so I don't know of a good way to do this 
offhand. Any ideas?
{quote}

I struggled with that decision a bit. As you mentioned, if you want to override 
the defaults, you'd need to do it in two places if you use it for the client 
and for the tasks as well (and for the vast majority of the cases I would 
imagine that is the case).

At least I feel that it would be better if at least the default is in one 
place. In that sense, how about having the default in ApplicationClassLoader 
itself? You still need to override it in two places, but it feels like an 
improvement over the current version.

{quote}
The doc comments in hadoop-config.sh should mention the client system classes 
variable, how to use it, and potentially even its default value. I know, I 
know. Yet another place to update if it changes, but users will likely have 
easy access to the config comment and not the java/javadoc for RunJar. Or maybe 
the default should already be in hadoop-config.sh with a hardcoded, last-resort 
fallback in RunJar if not set in hadoop-config.sh? Anyway we should at least 
mention the ability to specify the system classes.
{quote}

I agree it would be good to document the usage of the system classes env 
variable. I'll add the comment to hadoop-config.sh. See above for where to 
define the default and let me know what you think.

{quote}
Would be nice if we could have a unit test to verify the functionality is 
working going forward. Maybe a unit test that writes out some app code in a 
jar, has RunJar run it with the client classloader, and the app code verifies 
it has appropriate classpath semantics? Thinking something along the lines of 
how TestApplicationClassloader works but verifying RunJar setup the 
classloaders properly.
{quote}

Let me look into a unit test for this that involves RunJar. Do you happen to 
know of an existing test that writes out classes/jars off the top of your head?

{quote}
Nit: Not thrilled to see that the variable just has to be defined to anything, 
although I see HADOOP_USER_CLASSPATH_FIRST set a precedent for it. Leads to 
unexpected behavior if a user sees something like 
HADOOP_USER_CLASSPATH_FIRST=true and tries HADOOP_USER_CLASSPATH_FIRST=false. 
Not a must-fix, but it'd be nice to only accept expected values for the 
variable. A shell func to sanity-check a boolean env would be helpful, maybe 
something to tackle in a followup JIRA.
{quote}

Yes I stumbled on that as well, and it struck me as an odd behavior. I think 
I'll file a separate JIRA to tackle that issue...

> isolated classloader on the client side
> ---------------------------------------
>
>                 Key: HADOOP-10893
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10893
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 2.4.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: HADOOP-10893.patch, HADOOP-10893.patch, 
> classloader-test.tar.gz
>
>
> We have the job classloader on the mapreduce tasks that run on the cluster. 
> It has a benefit of being able to isolate class space for user code and avoid 
> version clashes.
> Although it occurs less often, version clashes do occur on the client JVM. It 
> would be good to introduce an isolated classloader on the client side as well 
> to address this. A natural point to introduce this may be through RunJar, as 
> that's how most of hadoop jobs are run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10893) isolated classloader on the client side

Reply via email to