Reducer.reduce method's OutputCollector is too strict, it shoudn't need the key 
to be WritableComparable
--------------------------------------------------------------------------------------------------------

                 Key: HADOOP-1827
                 URL: https://issues.apache.org/jira/browse/HADOOP-1827
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.14.0
            Reporter: Arun C Murthy


The output of the {{Reducer}}'s reduce method is *not* sorted, hence the 
{{OutputCollector}} passed to it shouldn't require the *key* to be 
{{WritableComparable}}; passing a {{Writable}} should suffice.

Thus

{code: title=Reducer.java}
public interface Reducer<K2 extends WritableComparable, V2 extends Writable, 
                         K3 extends WritableComparable, V3 extends Writable> 
extends JobConfigurable, Closeable {

  void reduce(K2 key, Iterator<V2> values, OutputCollector<K3, V3> output, 
Reporter reporter) 
  throws IOException;

}
{code}

should, technically, be:

{code: title=Reducer.java}
public interface Reducer<K2 extends WritableComparable, V2 extends Writable, 
                         K3 extends Writable, V3 extends Writable> 
extends JobConfigurable, Closeable {

  void reduce(K2 key, Iterator<V2> values, OutputCollector<K3, V3> output, 
Reporter reporter) 
  throws IOException;

}
{code}



Pros:
It removes an artificial limitation where it forces applications to emit 
<{{WritableComparable}}, {{Writable}}> pair, rather than a <{{Writable}}, 
{{Writable}}> pair, there-by easing some applications (I ran into a few 
recently... admittedly trivial ones).

Cons:
1. We now need a separate {{Combiner}} interface, since the combiner's 
{{OutputCollector}} *needs* to be able to sort keys, hence requires a 
{{WritableComparable}} - same as the {{Mapper}}.
2. We need a separate {{SortableOutputCollector}} (for {{Mapper}}/{{Combiner}}) 
and a {{NonSortableOutputCollector}} (for {{Reducer}}).
3. Alas! As a consequence of (1) & (2)we cannot use the same class as both a 
{{Reducer}} and {{Combiner}} anymore, a serious compatibility issue.



The purpose of this issue is two-fold:
1. Spark a discussion among folks, both hadoop-dev & hadoop-users, to figure if 
this really is a problem i.e. do folks really care about this anomaly in the 
existing {{Reducer}} interface? Also, is it worth the pain (@see 'Cons') to go 
fix it.
2. Even if we decide to live with it, this issue could record for posterity why 
we love hadoop, warts and all. *smile*

Lets discuss...


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to