This is for hadoop-0.20-append and any other hadoop-0.20.* and later I
believe.

We run hadoop tasks within an embedded runtime (z2-environment) that has
its own class loading hierarchy (not like OSGi but should have the same
problem there). The actual Mapper (Reducer etc) is a generic
implementation
that delegates to a Mapper (Reducer etc.) in some component loaded by a
child loader of the system class loader.

When using custom input format implementations, those will not be found
unless present on the system class path. 

The reason for that is that MapTask and ReduceTask do not use the right
class loader for retrieving the input format class (see
MapTask.runNewMapper()). 

Both use the class loader in taskContext.getConfiguration() which is not
set appropriately at that point in time.

We fixed that by

a) ...having the generic mappers/reducers implement Configurable
b) ...setting Configuration.setClassLoader during setConf calls on those
c) ...inserting
taskContext.getConfiguration().setClassLoader(job.getClassLoader());

in MapTask / ReduceTask before retrieving the input format class.

I suggest to include 

taskContext.getConfiguration().setClassLoader(job.getClassLoader());

at MapTask#574 (branch-0.20-append) and ReduceTask#555 resp. so that
mappers/reducers can use the configuration object to set the class
loader used during retrieval of classes in the task context.

That is, unless somebody has a better fix of course or there is some
other misunderstanding from my side. 

Has this issue been identified before (didn't find a match - but there
are so many currently)? 

Thanks,
  Henning

Reply via email to