MultipleOutputs should use newer Hadoop serialization interface since 0.19
--------------------------------------------------------------------------

                 Key: HADOOP-5167
                 URL: https://issues.apache.org/jira/browse/HADOOP-5167
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.19.0
         Environment: Environment-independent issue
            Reporter: Mikhail Yakshin


We have a system based on Hadoop 0.18 / Cascading 0.8.1 and now I'm trying to 
port it to Hadoop 0.19 / Cascading 1.0. The first serious problem I've got into 
that we're extensively using MultipleOutputs in our jobs dealing with sequence 
files that store Cascading's Tuples.

Since Cascading 0.9, Tuples stopped being WritableComparable and implemented 
generic Hadoop serialization interface and framework. However, in Hadoop 0.19, 
MultipleOutputs require use of older WritableComparable interface. Thus, trying 
to do something like:

{noformat}
MultipleOutputs.addNamedOutput(conf, "output-name",
MySpecialMultiSplitOutputFormat.class, Tuple.class, Tuple.class);
mos = new MultipleOutputs(conf);
...
mos.getCollector("output-name", reporter).collect(tuple1, tuple2);
{noformat} 

yields an error:

{noformat}
java.lang.RuntimeException: java.lang.RuntimeException: class
cascading.tuple.Tuple not org.apache.hadoop.io.WritableComparable
       at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
       at 
org.apache.hadoop.mapred.lib.MultipleOutputs.getNamedOutputKeyClass(MultipleOutputs.java:252)
       at 
org.apache.hadoop.mapred.lib.MultipleOutputs$InternalFileOutputFormat.getRecordWriter(MultipleOutputs.java:556)
       at 
org.apache.hadoop.mapred.lib.MultipleOutputs.getRecordWriter(MultipleOutputs.java:425)
       at 
org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:511)
       at 
org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:476)
       at my.namespace.MyReducer.reduce(MyReducer.java:xxx)
{noformat}

MultipleOutputs should eventually be ported to use more generic Hadoop 
serialization, as I understand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to