Re: is implementing WritableComparable and setting Job.setSortComparatorClass(...) redundant?

Jane Wayne Tue, 20 Mar 2012 08:57:47 -0700

thanks chris!

On Tue, Mar 20, 2012 at 6:30 AM, Chris White <[email protected]>wrote:


> Setting sortComparatorClass will allow you to configure a
> RawComparator implementation (allowing you to do more efficient
> comparisons at the byte level). If you don't set it then hadoop uses
> the WritableComparator by default. This implementation deserializes
> the bytes into instances using your readFields method and then calls
> compareTo to determine key ordering. (look at the source in
> org.apache.hadoop.io.WritableComparator.compare(byte[], int, int,
> byte[], int, int))
>
> So if you don't want to be as efficient as possible, then delegating
> to WritableComparator is probably fine.
>
> Note that you can also configure a RawComparator for your key class
> using a static block to register it with WritableComparator, look at
> the source for Text for an example of this:
>
> /** A WritableComparator optimized for Text keys. */
>  public static class Comparator extends WritableComparator {
>    public Comparator() {
>      super(Text.class);
>    }
>
>    public int compare(byte[] b1, int s1, int l1,
>                       byte[] b2, int s2, int l2) {
>      int n1 = WritableUtils.decodeVIntSize(b1[s1]);
>      int n2 = WritableUtils.decodeVIntSize(b2[s2]);
>      return compareBytes(b1, s1+n1, l1-n1, b2, s2+n2, l2-n2);
>    }
>  }
>
>  static {
>    // register this comparator
>    WritableComparator.define(Text.class, new Comparator());
>  }
>
> Chris
>
> On Tue, Mar 20, 2012 at 2:47 AM, Jane Wayne <[email protected]>
> wrote:
> > quick question:
> >
> > i have a key that already implements WritableComparable. this will be the
> > intermediary key passed from the map to the reducer.
> >
> > is it necessary to extend RawComparator and set it on
> > Job.setSortComparatorClass(Class<? extends RawComparator> cls) ?
>

Re: is implementing WritableComparable and setting Job.setSortComparatorClass(...) redundant?

Reply via email to