[ 
https://issues.apache.org/jira/browse/HADOOP-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096280#comment-14096280
 ] 

Jason Lowe commented on HADOOP-10893:
-------------------------------------

Thanks, Sangjin!  Some comments on the patch:

HADOOP_USE_CLIENT_CLASSLOADER would be a better name to keep within the 
HADOOP_* shell variable namespace and is consistent with 
HADOOP_USER_CLASSPATH_FIRST.  Similarly HADOOP_CLIENT_SYSTEM_CLASSES.

ApplicationClassLoader was marked Public, so I'm wondering if we should leave a 
deprecated, trivial derivation of the new class location just in case someone 
referenced it?

What was the rationale behind the Splitter change which seems unrelated?

Would be nice if we could somehow tie the default system classes defined in 
RunJar with the default for the job classloader so we don't have to remember to 
change it in two places going forward.  Unfortunately the job classloader one 
is encoded in mapred-default.xml, so I don't know of a good way to do this 
offhand.  Any ideas?

The doc comments in hadoop-config.sh should mention the client system classes 
variable, how to use it, and potentially even its default value.  I know, I 
know.  Yet another place to update if it changes, but users will likely have 
easy access to the config comment and not the java/javadoc for RunJar.  Or 
maybe the default should already be in hadoop-config.sh with a hardcoded, 
last-resort fallback in RunJar if not set in hadoop-config.sh?  Anyway we 
should at least mention the ability to specify the system classes.

Would be nice if we could have a unit test to verify the functionality is 
working going forward.  Maybe a unit test that writes out some app code in a 
jar, has RunJar run it with the client classloader, and the app code verifies 
it has appropriate classpath semantics?  Thinking something along the lines of 
how TestApplicationClassloader works but verifying RunJar setup the 
classloaders properly.

Nit: Not thrilled to see that the variable just has to be defined to anything, 
although I see HADOOP_USER_CLASSPATH_FIRST set a precedent for it.  Leads to 
unexpected behavior if a user sees something like 
HADOOP_USER_CLASSPATH_FIRST=true and tries HADOOP_USER_CLASSPATH_FIRST=false.  
Not a must-fix, but it'd be nice to only accept expected values for the 
variable.  A shell func to sanity-check a boolean env would be helpful, maybe 
something to tackle in a followup JIRA.


> isolated classloader on the client side
> ---------------------------------------
>
>                 Key: HADOOP-10893
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10893
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 2.4.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: HADOOP-10893.patch, HADOOP-10893.patch, 
> classloader-test.tar.gz
>
>
> We have the job classloader on the mapreduce tasks that run on the cluster. 
> It has a benefit of being able to isolate class space for user code and avoid 
> version clashes.
> Although it occurs less often, version clashes do occur on the client JVM. It 
> would be good to introduce an isolated classloader on the client side as well 
> to address this. A natural point to introduce this may be through RunJar, as 
> that's how most of hadoop jobs are run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to