Re: is implementing WritableComparable and setting Job.setSortComparatorClass(...) redundant?

Chris White Tue, 20 Mar 2012 03:30:51 -0700

Setting sortComparatorClass will allow you to configure a
RawComparator implementation (allowing you to do more efficient
comparisons at the byte level). If you don't set it then hadoop uses
the WritableComparator by default. This implementation deserializes
the bytes into instances using your readFields method and then calls
compareTo to determine key ordering. (look at the source in
org.apache.hadoop.io.WritableComparator.compare(byte[], int, int,
byte[], int, int))


So if you don't want to be as efficient as possible, then delegating
to WritableComparator is probably fine.

Note that you can also configure a RawComparator for your key class
using a static block to register it with WritableComparator, look at
the source for Text for an example of this:

/** A WritableComparator optimized for Text keys. */
  public static class Comparator extends WritableComparator {
    public Comparator() {
      super(Text.class);
    }

    public int compare(byte[] b1, int s1, int l1,
                       byte[] b2, int s2, int l2) {
      int n1 = WritableUtils.decodeVIntSize(b1[s1]);
      int n2 = WritableUtils.decodeVIntSize(b2[s2]);
      return compareBytes(b1, s1+n1, l1-n1, b2, s2+n2, l2-n2);
    }
  }

  static {
    // register this comparator
    WritableComparator.define(Text.class, new Comparator());
  }

Chris

On Tue, Mar 20, 2012 at 2:47 AM, Jane Wayne <[email protected]> wrote:
> quick question:
>
> i have a key that already implements WritableComparable. this will be the
> intermediary key passed from the map to the reducer.
>
> is it necessary to extend RawComparator and set it on
> Job.setSortComparatorClass(Class<? extends RawComparator> cls) ?

Re: is implementing WritableComparable and setting Job.setSortComparatorClass(...) redundant?

Reply via email to