[
https://issues.apache.org/jira/browse/HADOOP-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096331#comment-14096331
]
Sangjin Lee commented on HADOOP-10893:
--------------------------------------
Thanks for the review Jason! It's helpful as always.
bq. HADOOP_USE_CLIENT_CLASSLOADER would be a better name to keep within the
HADOOP_* shell variable namespace and is consistent with
HADOOP_USER_CLASSPATH_FIRST. Similarly HADOOP_CLIENT_SYSTEM_CLASSES.
Sounds good. I'll make the change.
bq. ApplicationClassLoader was marked Public, so I'm wondering if we should
leave a deprecated, trivial derivation of the new class location just in case
someone referenced it?
I did not notice that it was marked public. I'll recreate a deprecated
extending class in its current location.
bq. What was the rationale behind the Splitter change which seems unrelated?
If possible, I wanted to avoid having a dependency from this classloader class
to another library unless it's really necessary. Splitter was coming from
guava. :) In theory it should be OK even if ApplicationClassLoader used a guava
class. It would be loaded by the system classloader anyway, and it would not
interfere with the ApplicationClassLoader's ability to load a new version of
the class for the user.
However, it was more of a call to minimize the external dependency from
ApplicationClassLoader. I believe the current version (using String.split()) is
equivalent and using the Splitter is not needed, but I'd be open to reversing
it.
{quote}
Would be nice if we could somehow tie the default system classes defined in
RunJar with the default for the job classloader so we don't have to remember to
change it in two places going forward. Unfortunately the job classloader one is
encoded in mapred-default.xml, so I don't know of a good way to do this
offhand. Any ideas?
{quote}
I struggled with that decision a bit. As you mentioned, if you want to override
the defaults, you'd need to do it in two places if you use it for the client
and for the tasks as well (and for the vast majority of the cases I would
imagine that is the case).
At least I feel that it would be better if at least the default is in one
place. In that sense, how about having the default in ApplicationClassLoader
itself? You still need to override it in two places, but it feels like an
improvement over the current version.
{quote}
The doc comments in hadoop-config.sh should mention the client system classes
variable, how to use it, and potentially even its default value. I know, I
know. Yet another place to update if it changes, but users will likely have
easy access to the config comment and not the java/javadoc for RunJar. Or maybe
the default should already be in hadoop-config.sh with a hardcoded, last-resort
fallback in RunJar if not set in hadoop-config.sh? Anyway we should at least
mention the ability to specify the system classes.
{quote}
I agree it would be good to document the usage of the system classes env
variable. I'll add the comment to hadoop-config.sh. See above for where to
define the default and let me know what you think.
{quote}
Would be nice if we could have a unit test to verify the functionality is
working going forward. Maybe a unit test that writes out some app code in a
jar, has RunJar run it with the client classloader, and the app code verifies
it has appropriate classpath semantics? Thinking something along the lines of
how TestApplicationClassloader works but verifying RunJar setup the
classloaders properly.
{quote}
Let me look into a unit test for this that involves RunJar. Do you happen to
know of an existing test that writes out classes/jars off the top of your head?
{quote}
Nit: Not thrilled to see that the variable just has to be defined to anything,
although I see HADOOP_USER_CLASSPATH_FIRST set a precedent for it. Leads to
unexpected behavior if a user sees something like
HADOOP_USER_CLASSPATH_FIRST=true and tries HADOOP_USER_CLASSPATH_FIRST=false.
Not a must-fix, but it'd be nice to only accept expected values for the
variable. A shell func to sanity-check a boolean env would be helpful, maybe
something to tackle in a followup JIRA.
{quote}
Yes I stumbled on that as well, and it struck me as an odd behavior. I think
I'll file a separate JIRA to tackle that issue...
> isolated classloader on the client side
> ---------------------------------------
>
> Key: HADOOP-10893
> URL: https://issues.apache.org/jira/browse/HADOOP-10893
> Project: Hadoop Common
> Issue Type: New Feature
> Components: util
> Affects Versions: 2.4.0
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Attachments: HADOOP-10893.patch, HADOOP-10893.patch,
> classloader-test.tar.gz
>
>
> We have the job classloader on the mapreduce tasks that run on the cluster.
> It has a benefit of being able to isolate class space for user code and avoid
> version clashes.
> Although it occurs less often, version clashes do occur on the client JVM. It
> would be good to introduce an isolated classloader on the client side as well
> to address this. A natural point to introduce this may be through RunJar, as
> that's how most of hadoop jobs are run.
--
This message was sent by Atlassian JIRA
(v6.2#6252)