Re: compaction does not reduce the number of regions

Stack Thu, 11 Feb 2010 10:24:22 -0800

On Thu, Feb 11, 2010 at 9:59 AM, Boris Aleksandrovsky
<[email protected]> wrote:
>  I noticed that compaction reduced the
> size of the region, but did not reduce the <it>number</it> of regions.
>


Thats right.  Once made, there is no going back currently, not unless
you run a manual merge of regions (see the Merge tool under util.
Currently you have to offline your table to merge which we need to
fix... table doesn't have to be offline to merge just as table does
not have to be offline for regions to split).


> My questions is:
> 1. How does the scan performance (and also random read performance) related
> to the number of regions in your experience? Perhaps there are some
> empirical data on the optimal regions size / number of regions per region
> server combination?

I don't have empirical evidence but would guess having your data
spread over more regions could improve random read performance
slightly since likely less storefiles inline on each read and perhaps
scans are a bit slower as there are more region transitions to ride
over.

I'm sure there is some optimal number but have done no measurements
figuring it (Also see back in mailing list where its argued that less
regions per server is better because then less churn on server
failure).

> 2. If performance suffer because there is a high number of small regions, is
> there a way to reduce the number of regions by merge or other means.
>
If you want to merge regions:

See 
http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/util/Merge.html#main(java.lang.String[]).
 To see its usage do:

./bin/hbase org.apache.hadoop.hbase.util.Merge

If you are concerned about performance, be sure to update to the head
of the 0.20 branch or patch your install with HBASE-2180.
St.Ack


>
> --
> Cheers,
>
> Boris
>

Re: compaction does not reduce the number of regions

Reply via email to