Sometimes when running hadoop jobs using the 'hadoop jar' command there are issues with the classloader. I presume these are caused by classes that are loaded BEFORE the commands main is invoced. For example, when relying on the MapWritable in the command, it is not possible to use a class that is not in the default idToClassMap. MapWritable.class is loaded before the user job is unpacked and therefore it's classloader will not be able to find custom classes. (At least classes that are in the RunJar it's classloader classpath).

I could not find any issues or discussion about this so I assume it is somewhat of an obscure issue (please correct me if I'm wrong). Anyway I would like to hear what you think of this and perhaps discuss a possible solution. Such as spawning the command in a new JVM. MapWritable or rather AbstractMapWritable uses a Class.forName(className) construction, maybe this can be changed so that uses the classloader of the current thread instead of its own class. (Will this work?)

A workaround for now is to explicitely put the jar itself on the classpath, i.e. 'env HADOOP_CLASSPATH=myJar hadoop jar myJar'.

Reply via email to