Reducer.reduce method's OutputCollector is too strict, it shoudn't need the key to be WritableComparable --------------------------------------------------------------------------------------------------------
Key: HADOOP-1827 URL: https://issues.apache.org/jira/browse/HADOOP-1827 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.14.0 Reporter: Arun C Murthy The output of the {{Reducer}}'s reduce method is *not* sorted, hence the {{OutputCollector}} passed to it shouldn't require the *key* to be {{WritableComparable}}; passing a {{Writable}} should suffice. Thus {code: title=Reducer.java} public interface Reducer<K2 extends WritableComparable, V2 extends Writable, K3 extends WritableComparable, V3 extends Writable> extends JobConfigurable, Closeable { void reduce(K2 key, Iterator<V2> values, OutputCollector<K3, V3> output, Reporter reporter) throws IOException; } {code} should, technically, be: {code: title=Reducer.java} public interface Reducer<K2 extends WritableComparable, V2 extends Writable, K3 extends Writable, V3 extends Writable> extends JobConfigurable, Closeable { void reduce(K2 key, Iterator<V2> values, OutputCollector<K3, V3> output, Reporter reporter) throws IOException; } {code} Pros: It removes an artificial limitation where it forces applications to emit <{{WritableComparable}}, {{Writable}}> pair, rather than a <{{Writable}}, {{Writable}}> pair, there-by easing some applications (I ran into a few recently... admittedly trivial ones). Cons: 1. We now need a separate {{Combiner}} interface, since the combiner's {{OutputCollector}} *needs* to be able to sort keys, hence requires a {{WritableComparable}} - same as the {{Mapper}}. 2. We need a separate {{SortableOutputCollector}} (for {{Mapper}}/{{Combiner}}) and a {{NonSortableOutputCollector}} (for {{Reducer}}). 3. Alas! As a consequence of (1) & (2)we cannot use the same class as both a {{Reducer}} and {{Combiner}} anymore, a serious compatibility issue. The purpose of this issue is two-fold: 1. Spark a discussion among folks, both hadoop-dev & hadoop-users, to figure if this really is a problem i.e. do folks really care about this anomaly in the existing {{Reducer}} interface? Also, is it worth the pain (@see 'Cons') to go fix it. 2. Even if we decide to live with it, this issue could record for posterity why we love hadoop, warts and all. *smile* Lets discuss... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.