[
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071856#comment-16071856
]
Anoop Sam John commented on HBASE-18294:
----------------------------------------
When the data size of the memstore is high (choosing one to flush), the heap
occupancy of it also will be on higher side no? Am speaking abt the common
case. We have basic flattening as the default now and CompactingMemstore is
the default. So we can expect all the memstores to be in this way not like
some are flattened and some are never. (Am speaking wrt defaults configs as we
have now).
One can tune the region flush size as per the new changes. May be the default
itself we can change now.
But considering data size only for the per region flush decision is more inline
with a normal user thinking. 128MB size I have configured and the data is
flushed at that data size reach. What is the heap overhead (we have some with
DefaultMemstore and some thing else when compacting memstore basic mode in
place etc. Tomorrow if a new algorithm comes it may even reduce) is not a user
headache. Still we have to consider that as we can not make our RS to OOME or
have GC bad impacts. That any way at global level we are doing.
In normal cases also flushes due to global pressure might be happening. Say
we have 100 regions per RS and then as per default settings, the ideal heap
size need for global memstores is
100 * 128 MB = 12.5 GB
12.5 * 4 = 50 GB.
We allow the memstores size to grow 4 times 128 MB before blocking.
So configuring this big size might not be the usual case. Agree that we will
kick start the flush of region once the size is 128 MB. But if the write
pressure is high the size can grow beyond 2x .
> Flush is based on data size instead of heap size
> ------------------------------------------------
>
> Key: HBASE-18294
> URL: https://issues.apache.org/jira/browse/HBASE-18294
> Project: HBase
> Issue Type: Bug
> Reporter: Eshcar Hillel
> Assignee: Eshcar Hillel
>
> A region is flushed if its memory component exceed a threshold (default size
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the
> store to another threshold (that can be configured with
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size
> (key-value only) to the threshold where it should compare the heap size
> (which includes index size, and metadata).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)