[
https://issues.apache.org/jira/browse/HADOOP-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096280#comment-14096280
]
Jason Lowe commented on HADOOP-10893:
-------------------------------------
Thanks, Sangjin! Some comments on the patch:
HADOOP_USE_CLIENT_CLASSLOADER would be a better name to keep within the
HADOOP_* shell variable namespace and is consistent with
HADOOP_USER_CLASSPATH_FIRST. Similarly HADOOP_CLIENT_SYSTEM_CLASSES.
ApplicationClassLoader was marked Public, so I'm wondering if we should leave a
deprecated, trivial derivation of the new class location just in case someone
referenced it?
What was the rationale behind the Splitter change which seems unrelated?
Would be nice if we could somehow tie the default system classes defined in
RunJar with the default for the job classloader so we don't have to remember to
change it in two places going forward. Unfortunately the job classloader one
is encoded in mapred-default.xml, so I don't know of a good way to do this
offhand. Any ideas?
The doc comments in hadoop-config.sh should mention the client system classes
variable, how to use it, and potentially even its default value. I know, I
know. Yet another place to update if it changes, but users will likely have
easy access to the config comment and not the java/javadoc for RunJar. Or
maybe the default should already be in hadoop-config.sh with a hardcoded,
last-resort fallback in RunJar if not set in hadoop-config.sh? Anyway we
should at least mention the ability to specify the system classes.
Would be nice if we could have a unit test to verify the functionality is
working going forward. Maybe a unit test that writes out some app code in a
jar, has RunJar run it with the client classloader, and the app code verifies
it has appropriate classpath semantics? Thinking something along the lines of
how TestApplicationClassloader works but verifying RunJar setup the
classloaders properly.
Nit: Not thrilled to see that the variable just has to be defined to anything,
although I see HADOOP_USER_CLASSPATH_FIRST set a precedent for it. Leads to
unexpected behavior if a user sees something like
HADOOP_USER_CLASSPATH_FIRST=true and tries HADOOP_USER_CLASSPATH_FIRST=false.
Not a must-fix, but it'd be nice to only accept expected values for the
variable. A shell func to sanity-check a boolean env would be helpful, maybe
something to tackle in a followup JIRA.
> isolated classloader on the client side
> ---------------------------------------
>
> Key: HADOOP-10893
> URL: https://issues.apache.org/jira/browse/HADOOP-10893
> Project: Hadoop Common
> Issue Type: New Feature
> Components: util
> Affects Versions: 2.4.0
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Attachments: HADOOP-10893.patch, HADOOP-10893.patch,
> classloader-test.tar.gz
>
>
> We have the job classloader on the mapreduce tasks that run on the cluster.
> It has a benefit of being able to isolate class space for user code and avoid
> version clashes.
> Although it occurs less often, version clashes do occur on the client JVM. It
> would be good to introduce an isolated classloader on the client side as well
> to address this. A natural point to introduce this may be through RunJar, as
> that's how most of hadoop jobs are run.
--
This message was sent by Atlassian JIRA
(v6.2#6252)