On Thu, Feb 11, 2010 at 9:59 AM, Boris Aleksandrovsky <[email protected]> wrote: > I noticed that compaction reduced the > size of the region, but did not reduce the <it>number</it> of regions. >
Thats right. Once made, there is no going back currently, not unless you run a manual merge of regions (see the Merge tool under util. Currently you have to offline your table to merge which we need to fix... table doesn't have to be offline to merge just as table does not have to be offline for regions to split). > My questions is: > 1. How does the scan performance (and also random read performance) related > to the number of regions in your experience? Perhaps there are some > empirical data on the optimal regions size / number of regions per region > server combination? I don't have empirical evidence but would guess having your data spread over more regions could improve random read performance slightly since likely less storefiles inline on each read and perhaps scans are a bit slower as there are more region transitions to ride over. I'm sure there is some optimal number but have done no measurements figuring it (Also see back in mailing list where its argued that less regions per server is better because then less churn on server failure). > 2. If performance suffer because there is a high number of small regions, is > there a way to reduce the number of regions by merge or other means. > If you want to merge regions: See http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/util/Merge.html#main(java.lang.String[]). To see its usage do: ./bin/hbase org.apache.hadoop.hbase.util.Merge If you are concerned about performance, be sure to update to the head of the 0.20 branch or patch your install with HBASE-2180. St.Ack > > -- > Cheers, > > Boris >
