Gabriel Reid created CRUNCH-539:
-----------------------------------

             Summary: Use of TupleWritable.setConf fails in mapper/reducer
                 Key: CRUNCH-539
                 URL: https://issues.apache.org/jira/browse/CRUNCH-539
             Project: Crunch
          Issue Type: Bug
    Affects Versions: 0.12.0
            Reporter: Gabriel Reid


In (at least) more recent versions of Hadoop 2, the implicit call to 
TupleWritable.setConf that happens when using TupleWritables fails with a 
ClassNotFoundException for (ironically) the TupleWritable class.

This appears to be due to the way that ObjectInputStream resolves classes in 
its [resolveClass 
method|https://docs.oracle.com/javase/7/docs/api/java/io/ObjectInputStream.html#resolveClass(java.io.ObjectStreamClass)],
 together with the way that the context classloader is set within a hadoop 
mapper or reducer.

This is similar to PIG-2532.

This can be reproduced in the local job tracker (at least) in Hadoop 2.7.0, but 
it can't be reproduced in Crunch integration tests (due to classloading setup). 
It appears that this issue is only present in Crunch 0.12.

The following code within a simple pipeline will cause this issue to occur:
{code}
PTable<String, Integer> yearTemperatures = ... /* Writable-based PTable */
PTable<String, Integer> maxTemps = yearTemperatures
                .groupByKey()
                .combineValues(Aggregators.MAX_INTS())
                .top(1);   //LINE THAT CAUSES THE ERROR
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to