Re: Bulk import - does sort order of input data affect success rate?

Billy Pearson Sun, 05 Apr 2009 00:46:52 -0700

I found using HRegionPartitioner on tables that are not new and have multiregions per server it speeds things up might lookin to making a HServerPartitioner one reduce per server but would loseperformance if the server has many spare cores to use.


Billy

----- Original Message -----From: "Ryan Rawson" <[email protected]>

Newsgroups: gmane.comp.java.hadoop.hbase.user
To: <hbase-user-7ArZoLwFLBtd/SJB6HiN2Ni2O/[email protected]>
Sent: Thursday, April 02, 2009 5:53 PM

Subject: Re: Bulk import - does sort order of input data affect successrate?

hey,

sorted = slower, randomized = faster.

this is because if it is sorted in natural key order, you tend to hotspotin

1 or 2 regions.

I don't use table output format, I use direct commits from the map, no
reduce. That seems to be the most performance solution.

have fun!

On Thu, Apr 2, 2009 at 1:36 PM, Stuart White<[email protected]>wrote:

On Thu, Apr 2, 2009 at 3:30 PM, Ryan Rawson<[email protected]> wrote:

> The last thing - success should not be a function of sort order.
>
> However, speed will be related.

How?  Sorted = faster, or Sorted = slower?

>
> One thing I found I had to do was:

> private void doCommit(HTable t, BatchUpdate update) throws> IOException

{
>      boolean commited = false;
>      while (!commited) {
>        try {
>          t.commit(update);
>          commited = true;
>        } catch (RetriesExhaustedException e) {
>          // DAMN, ignore
>        }
>      }
>    }
>

I'm running a mapred job, using TableOutputFormat to write the results
to HBase.  For the code you've provided, was that for a custom output
format?  Or a standalone (non-mapred) application?  I see the point
you're making, I just don't understand where I'd put that code.
Thanks!

Re: Bulk import - does sort order of input data affect success rate?

Reply via email to