Sometimes when running hadoop jobs using the 'hadoop jar' command there
are issues with the classloader. I presume these are caused by classes
that are loaded BEFORE the commands main is invoced. For example, when
relying on the MapWritable in the command, it is not possible to use a
class that is not in the default idToClassMap. MapWritable.class is
loaded before the user job is unpacked and therefore it's classloader
will not be able to find custom classes. (At least classes that are in
the RunJar it's classloader classpath).
I could not find any issues or discussion about this so I assume it is
somewhat of an obscure issue (please correct me if I'm wrong). Anyway I
would like to hear what you think of this and perhaps discuss a possible
solution. Such as spawning the command in a new JVM. MapWritable or
rather AbstractMapWritable uses a Class.forName(className) construction,
maybe this can be changed so that uses the classloader of the current
thread instead of its own class. (Will this work?)
A workaround for now is to explicitely put the jar itself on the
classpath, i.e. 'env HADOOP_CLASSPATH=myJar hadoop jar myJar'.
- RunJar classloader issues Ferdy Galema
-