[jira] [Resolved] (MAPREDUCE-190) MultipleOutputs should use newer Hadoop serialization interface since 0.19

Allen Wittenauer (JIRA) Mon, 21 Jul 2014 12:41:26 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Allen Wittenauer resolved MAPREDUCE-190.
----------------------------------------

    Resolution: Incomplete

I'm going to close this out as stale.

> MultipleOutputs should use newer Hadoop serialization interface since 0.19
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-190
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-190
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Environment-independent issue
>            Reporter: Mikhail Yakshin
>
> We have a system based on Hadoop 0.18 / Cascading 0.8.1 and now I'm trying to 
> port it to Hadoop 0.19 / Cascading 1.0. The first serious problem I've got 
> into that we're extensively using MultipleOutputs in our jobs dealing with 
> sequence files that store Cascading's Tuples.
> Since Cascading 0.9, Tuples stopped being WritableComparable and implemented 
> generic Hadoop serialization interface and framework. However, in Hadoop 
> 0.19, MultipleOutputs require use of older WritableComparable interface. 
> Thus, trying to do something like:
> {noformat}
> MultipleOutputs.addNamedOutput(conf, "output-name",
> MySpecialMultiSplitOutputFormat.class, Tuple.class, Tuple.class);
> mos = new MultipleOutputs(conf);
> ...
> mos.getCollector("output-name", reporter).collect(tuple1, tuple2);
> {noformat} 
> yields an error:
> {noformat}
> java.lang.RuntimeException: java.lang.RuntimeException: class
> cascading.tuple.Tuple not org.apache.hadoop.io.WritableComparable
>        at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
>        at 
> org.apache.hadoop.mapred.lib.MultipleOutputs.getNamedOutputKeyClass(MultipleOutputs.java:252)
>        at 
> org.apache.hadoop.mapred.lib.MultipleOutputs$InternalFileOutputFormat.getRecordWriter(MultipleOutputs.java:556)
>        at 
> org.apache.hadoop.mapred.lib.MultipleOutputs.getRecordWriter(MultipleOutputs.java:425)
>        at 
> org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:511)
>        at 
> org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:476)
>        at my.namespace.MyReducer.reduce(MyReducer.java:xxx)
> {noformat}
> MultipleOutputs should eventually be ported to use more generic Hadoop 
> serialization, as I understand.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAPREDUCE-190) MultipleOutputs should use newer Hadoop serialization interface since 0.19

Reply via email to