I found using HRegionPartitioner on tables that are not new and have multi
regions per server it speeds things up might look
in to making a HServerPartitioner one reduce per server but would lose
performance if the server has many spare cores to use.
Billy
----- Original Message -----
From: "Ryan Rawson" <[email protected]>
Newsgroups: gmane.comp.java.hadoop.hbase.user
To: <hbase-user-7ArZoLwFLBtd/SJB6HiN2Ni2O/[email protected]>
Sent: Thursday, April 02, 2009 5:53 PM
Subject: Re: Bulk import - does sort order of input data affect success
rate?
hey,
sorted = slower, randomized = faster.
this is because if it is sorted in natural key order, you tend to hotspot
in
1 or 2 regions.
I don't use table output format, I use direct commits from the map, no
reduce. That seems to be the most performance solution.
have fun!
On Thu, Apr 2, 2009 at 1:36 PM, Stuart White
<[email protected]>wrote:
On Thu, Apr 2, 2009 at 3:30 PM, Ryan Rawson
<[email protected]> wrote:
> The last thing - success should not be a function of sort order.
>
> However, speed will be related.
How? Sorted = faster, or Sorted = slower?
>
> One thing I found I had to do was:
> private void doCommit(HTable t, BatchUpdate update) throws
> IOException
{
> boolean commited = false;
> while (!commited) {
> try {
> t.commit(update);
> commited = true;
> } catch (RetriesExhaustedException e) {
> // DAMN, ignore
> }
> }
> }
>
I'm running a mapred job, using TableOutputFormat to write the results
to HBase. For the code you've provided, was that for a custom output
format? Or a standalone (non-mapred) application? I see the point
you're making, I just don't understand where I'd put that code.
Thanks!