On Wed, Feb 9, 2011 at 12:59 AM, Zhou Shuaifeng <[email protected]> wrote: > We have test a cluster which have more than 30,000 regions, max size of a > region is 512MB. At this situation, data no more growing, but remove some old > data and insert new, and regions will be more and more. > This occupies too much heapsize, and will be more if regions cannot be > merged. And it takes too long to make the table offline. >
I've seen this before where the region size chosen at the start turns out to be inappropriate -- or the initial config. was missing LZO -- and then at the end of the loading, or during, an adjustment needs to be made to keep an upper bound on region count. For example, in Zhou's case above, it sounds like the regions could have been bigger. With 30k regions, we can't do manual merges. A script that does a survey to pick out adjacent small regions that then does the online merge up seems like it would be useful. We also need the ability to do online edits of schema setting region size and compression without having to take down the table. Would you mind making an issue Zhou? It'd be of type umbrella since there is already effort to do features such as online schema edits. Thanks, St.Ack P.S. What version of hbase Zhou? Did you have compression enabled?
