[
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356538#comment-16356538
]
Anoop Sam John commented on HBASE-18294:
----------------------------------------
Just forget off heap usage and consider only on heap.
For the flush decisions what we have now is per region flush decision making
and globally forced flushes. Per region we have 128 MB size threshold. When
reaching this size, the region will initiate a flush. When the flush actually
happens, the size may be more that this threshold. We wont block the writes
once we decide to flush this region. By default we allow the region size to be
4x of this flush size. If the flushes are too slow, chances that the actual
size of the region reaches this 4x mark and we will reject even writes. And all
these sizes are heap sizes of memstore. So not just the cell data sizes (the
size of key, values) but also the heap overhead because of the Cell pojos and
other overhead because of entries into CSLM.
The other decision making is globally at RS level. We have a water mark which
defaults to 40% of heap size for all of the memstores. This is basically to
avoid the OOME. Per region level we have flush decision but there may be more
number of memstores in RS depending on the present number of regions/stores.
Also the region level flush marker is not a hard marker. We allow the size to
grow much larger and all matters is the IO speed of the cluster. When the RS
reaches this water mark, we will block writes and select some regions for the
flush and do force flushes. This will make sure that the global memstore size
is under control. And this is obviously heap size check.
Now when we have off heap also in the picture and allow the cell data to be in
off heap area, there has to be more consideration. We have to continue with
the per region level flush decisions. Also we should make sure that there wont
be OOME from on heap side or off heap side. Right now we have a config to
specify the off heap barrier for all memstore sizes. The off heap size occupied
by all memstores. This is needed any way and seems no question on that. Also
we will have to have the old on heap barrier check (def to 40% of Xmx)..
Because for memstores, there will be on heap usage for sure such as the Cell
POJO overhead , CSLM entries etc etc. That is why this check is if any of the
size breach happens, we will do writes blocking and fdo forced flushes.
For region level flushes, right now the 128 MB size check is against data size
alone. We wont consider the overhead. This raised some Qs and backward
behaviour consistency issues. So the general agreement we made is for on heap
cases, we must continue to check for 128 MB limit against the memstore heap
size. Not just data size. Also we have agreed that for off heap also, we will
consider the off heap size + heap overhead. So the region level decision
making is consistent for both on heap and off heap cases. Ideally checking the
data size alone here would have been the best way. I mean for any decision per
region level.
Also there was a concern raised how we select region(s) for flushing because of
global size breach. When the size breach is because of off heap size, we have
to select regions having maximum data size and when breach because of on heap
size limit, select the regions with more heap overhead.
> Reduce global heap pressure: flush based on heap occupancy
> ----------------------------------------------------------
>
> Key: HBASE-18294
> URL: https://issues.apache.org/jira/browse/HBASE-18294
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 3.0.0
> Reporter: Eshcar Hillel
> Assignee: Eshcar Hillel
> Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch,
> HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch,
> HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch,
> HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch,
> HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch,
> HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch,
> HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch,
> HBASE-18294.master.01.patch
>
>
> A region is flushed if its memory component exceed a threshold (default size
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the
> store to another threshold (that can be configured with
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size
> (key-value only) to the threshold where it should compare the heap size
> (which includes index size, and metadata).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)