Inline.

J-D

On Thu, Feb 11, 2010 at 9:59 AM, Boris Aleksandrovsky <[email protected]>wrote:

> Hi guys,
>
> We have a table which stored previously uncompressed data which we changed
> to store GZ-compressed data. We performed a compaction on that table which
> shrank its size three-fold. However, I noticed that compaction reduced the
> size of the region, but did not reduce the <it>number</it> of regions.
>

HBase doesn't merge regions. There's a tool for that but it has to be run
while to cluster is offline.


>
> My questions is:
> 1. How does the scan performance (and also random read performance) related
> to the number of regions in your experience? Perhaps there are some
> empirical data on the optimal regions size / number of regions per region
> server combination?
>

It's a matter of RAM? If your data set fits in RAM then block caching will
help you a lot. A good number of regions per RS is between 100 and 1000 but
that depends on many factors so that number is probably meaning-less for you
;)


> 2. If performance suffer because there is a high number of small regions,
> is
> there a way to reduce the number of regions by merge or other means.
>

What I said, and performance shouldn't suffer if your clients are long
lived.


>
>
> --
> Cheers,
>
> Boris
>

Reply via email to