Inline. J-D
On Thu, Feb 11, 2010 at 9:59 AM, Boris Aleksandrovsky <[email protected]>wrote: > Hi guys, > > We have a table which stored previously uncompressed data which we changed > to store GZ-compressed data. We performed a compaction on that table which > shrank its size three-fold. However, I noticed that compaction reduced the > size of the region, but did not reduce the <it>number</it> of regions. > HBase doesn't merge regions. There's a tool for that but it has to be run while to cluster is offline. > > My questions is: > 1. How does the scan performance (and also random read performance) related > to the number of regions in your experience? Perhaps there are some > empirical data on the optimal regions size / number of regions per region > server combination? > It's a matter of RAM? If your data set fits in RAM then block caching will help you a lot. A good number of regions per RS is between 100 and 1000 but that depends on many factors so that number is probably meaning-less for you ;) > 2. If performance suffer because there is a high number of small regions, > is > there a way to reduce the number of regions by merge or other means. > What I said, and performance shouldn't suffer if your clients are long lived. > > > -- > Cheers, > > Boris >
