[jira] Commented: (AVRO-581) java: add reducer that separates keys and values when map output is pairs

Doug Cutting (JIRA) Thu, 17 Jun 2010 14:57:50 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879957#action_12879957
 ]


Doug Cutting commented on AVRO-581:
-----------------------------------

Philip's asked what the Pair class might look like.  Here's a first pass:

{code}
public class Pair<K,V> implements SpecificRecord {
  private Schema schema;
  private K key;
  private V value;
  public Pair(K key, Schema keySchema, V value, Schema valueSchema) {
    this.schema = getPairSchema(keySchema, valueSchema);
    this.key = key;
    this.value = value;
  }
  private static final Map<Schema,Map<Schema,Schema>> SCHEMA_CACHE = 
    new WeakHashMap<Schema,Map<Schema,Schema>>();
  private static Schema getPairSchema(Schema key, Schema value) {
    synchronized (SCHEMA_CACHE) {
      ... see schema template in comment above....
    }
  }
  public K key() { return key; }
  public void key(K key) { this.key = key; }
  public V value() { return value; }
  public void value(V value) { this.value = value; }

  // IndexedRecord methods, used by reader/writers
  public Object get(int i) {
     switch (i) {
       case 0: return key;
       case 1: return value;
       default: throw ...
     } 
  }
  @SuppressWarnings("unchecked")
  public void set(int i, Object o) {
     switch (i) {
       case 0: this.key = o;
       case 1: this.value = o;
       default: throw ...
     } 
  }
}
{code}
I believe this would work as-is with SpecificDatumReader/Writer so long as K 
and V are specific data that conform to the schemas provided.  It would also 
work with GenericDatumWriter if K and V are generic data conforming the the 
schemas provided.  It would not work with GenericDatumReader, since that's 
hardwired to always create a GenericData.Record for a record schema.

When this is used for map output we can split outputs when passing them to 
Hadoop and then let Hadoop's reduce grouping support rather than layering our 
own as in AVRO-513.

> java: add reducer that separates keys and values when map output is pairs
> -------------------------------------------------------------------------
>
>                 Key: AVRO-581
>                 URL: https://issues.apache.org/jira/browse/AVRO-581
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.0
>
>
> We should add a Pair<K,V> class, implementing SpecificRecord, that combines 
> instances of two schemas (specific or generic).  Pairs would be compared by 
> key, ignoring value.  The template for its schema would be:
> {code}
> {"type": "record", "name": "org.apache.avro.mapred.Pair", "fields":[
>   {"name": "key", "type":" <<insert key schema here>>},
>   {"name": "value", "order": "ignore", "type": <<insert value schema>>}
> ]}
> {code}
> When map outputs are instances of this class, a reducer may be used whose 
> reduce method is something like:
> public abstract void reduce(K key, Iterable<V> values);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-581) java: add reducer that separates keys and values when map output is pairs

Reply via email to