[ https://issues.apache.org/jira/browse/CRUNCH-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabriel Reid updated CRUNCH-539: -------------------------------- Attachment: CRUNCH-539.patch Patch to resolve this. Instead of using ObjectOutputStream and ObjectInputStream to serialize/deserialize the BiMap of codes to WritableComparable classes, the map is serialized as simple text (which also makes it easier to debug). Tested this in the local job tracker in Hadoop 2.7.0 (also with custom WritableComparables registered) in code that is pretty much the same as that in the original mail thread (see http://s.apache.org/1jc). It isn't possible (or it's at least non-trivial) to add a real of this in the Crunch integration tests, due to it depending on loading classes that aren't available to the root class loader. > Use of TupleWritable.setConf fails in mapper/reducer > ---------------------------------------------------- > > Key: CRUNCH-539 > URL: https://issues.apache.org/jira/browse/CRUNCH-539 > Project: Crunch > Issue Type: Bug > Affects Versions: 0.12.0 > Reporter: Gabriel Reid > Attachments: CRUNCH-539.patch > > > In (at least) more recent versions of Hadoop 2, the implicit call to > TupleWritable.setConf that happens when using TupleWritables fails with a > ClassNotFoundException for (ironically) the TupleWritable class. > This appears to be due to the way that ObjectInputStream resolves classes in > its [resolveClass > method|https://docs.oracle.com/javase/7/docs/api/java/io/ObjectInputStream.html#resolveClass(java.io.ObjectStreamClass)], > together with the way that the context classloader is set within a hadoop > mapper or reducer. > This is similar to PIG-2532. > This can be reproduced in the local job tracker (at least) in Hadoop 2.7.0, > but it can't be reproduced in Crunch integration tests (due to classloading > setup). It appears that this issue is only present in Crunch 0.12. > The following code within a simple pipeline will cause this issue to occur: > {code} > PTable<String, Integer> yearTemperatures = ... /* Writable-based PTable */ > PTable<String, Integer> maxTemps = yearTemperatures > .groupByKey() > .combineValues(Aggregators.MAX_INTS()) > .top(1); //LINE THAT CAUSES THE ERROR > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)