The biggest legitimate reason to run smaller region size is if your
data set is small (lets say 400mb) but highly accessed, so you want a
good spread of regions across your cluster.

Another is to run a larger region if you are having a huge table and
you want to keep absolute region count low. I am not 100% sold on this
yet.

I have a patch that can keep performance high during a highly split
table, by using parallel puts. This has been proven to keep aggregate
performance really high, and I hope it will make 0.20.3.

On Tue, Dec 22, 2009 at 2:31 PM, stack <[email protected]> wrote:
> On Tue, Dec 22, 2009 at 8:57 AM, Mark Vigeant
> <[email protected]>wrote:
>
>> J-D,
>>
>> I noticed that performance for uploading data into tables got a lot better
>> as I lowered the max file size -- but up until a certain point, where the
>> performance began slowing down again.
>>
>>
> Tell us more.  What kinda size changes did you make?  How many regions were
> created?  Is the slow down because table is splitting all the time?  If you
> study regionserver logs, can you make out what the regionservers are
> spending their times doing?
>
>
>
>> Is there a rule of thumb/formula/notion to rely on when setting this
>> parameter for optimal performance? Thanks!
>>
>>
> We have most experience running defaults.  Generally folks go up from the
> default size because they want to host more data in about same number or
> regions.  Going down from the default I've not seen much of.
>
> St.Ack
>

Reply via email to