Re: Bulk import - does sort order of input data affect success rate?

Ryan Rawson Thu, 02 Apr 2009 15:53:58 -0700

hey,

sorted = slower, randomized = faster.


this is because if it is sorted in natural key order, you tend to hotspot in
1 or 2 regions.

I don't use table output format, I use direct commits from the map, no
reduce. That seems to be the most performance solution.

have fun!


On Thu, Apr 2, 2009 at 1:36 PM, Stuart White <[email protected]>wrote:

> On Thu, Apr 2, 2009 at 3:30 PM, Ryan Rawson <[email protected]> wrote:
> > The last thing - success should not be a function of sort order.
> >
> > However, speed will be related.
>
> How?  Sorted = faster, or Sorted = slower?
>
> >
> > One thing I found I had to do was:
> >    private void doCommit(HTable t, BatchUpdate update) throws IOException
> {
> >      boolean commited = false;
> >      while (!commited) {
> >        try {
> >          t.commit(update);
> >          commited = true;
> >        } catch (RetriesExhaustedException e) {
> >          // DAMN, ignore
> >        }
> >      }
> >    }
> >
>
> I'm running a mapred job, using TableOutputFormat to write the results
> to HBase.  For the code you've provided, was that for a custom output
> format?  Or a standalone (non-mapred) application?  I see the point
> you're making, I just don't understand where I'd put that code.
> Thanks!
>

Re: Bulk import - does sort order of input data affect success rate?

Reply via email to