I found using HRegionPartitioner on tables that are not new and have multi regions per server it speeds things up might look in to making a HServerPartitioner one reduce per server but would lose performance if the server has many spare cores to use.

Billy

----- Original Message ----- From: "Ryan Rawson" <[email protected]>
Newsgroups: gmane.comp.java.hadoop.hbase.user
To: <hbase-user-7ArZoLwFLBtd/SJB6HiN2Ni2O/[email protected]>
Sent: Thursday, April 02, 2009 5:53 PM
Subject: Re: Bulk import - does sort order of input data affect success rate?


hey,

sorted = slower, randomized = faster.

this is because if it is sorted in natural key order, you tend to hotspot in
1 or 2 regions.

I don't use table output format, I use direct commits from the map, no
reduce. That seems to be the most performance solution.

have fun!


On Thu, Apr 2, 2009 at 1:36 PM, Stuart White <[email protected]>wrote:

On Thu, Apr 2, 2009 at 3:30 PM, Ryan Rawson <[email protected]> wrote:
> The last thing - success should not be a function of sort order.
>
> However, speed will be related.

How?  Sorted = faster, or Sorted = slower?

>
> One thing I found I had to do was:
> private void doCommit(HTable t, BatchUpdate update) throws > IOException
{
>      boolean commited = false;
>      while (!commited) {
>        try {
>          t.commit(update);
>          commited = true;
>        } catch (RetriesExhaustedException e) {
>          // DAMN, ignore
>        }
>      }
>    }
>

I'm running a mapred job, using TableOutputFormat to write the results
to HBase.  For the code you've provided, was that for a custom output
format?  Or a standalone (non-mapred) application?  I see the point
you're making, I just don't understand where I'd put that code.
Thanks!




Reply via email to