[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356538#comment-16356538
 ] 

Anoop Sam John commented on HBASE-18294:
----------------------------------------

Just forget off heap usage and consider only on heap.
For the flush decisions what we have now is per region flush decision making 
and globally forced flushes.  Per region we have 128 MB size threshold. When 
reaching this size, the region will initiate a flush. When the flush actually 
happens, the size may be more that this threshold. We wont block the writes 
once we decide to flush this region. By default we allow the region size to be 
4x of this flush size. If the flushes are too slow, chances that the actual 
size of the region reaches this 4x mark and we will reject even writes. And all 
these sizes are heap sizes of memstore. So not just the cell data sizes (the 
size of key, values) but also the heap overhead because of the Cell pojos and 
other overhead because of entries into CSLM.

The other decision making is globally at RS level. We have a water mark which 
defaults to 40% of heap size for all of the memstores. This is basically to 
avoid the OOME. Per region level we have flush decision but there may be more 
number of memstores in RS depending on the present number of regions/stores. 
Also the region level flush marker is not a hard marker. We allow the size to 
grow much larger and all matters is the IO speed of the cluster.   When the RS 
reaches this water mark, we will block writes and select some regions for the 
flush and do force flushes. This will make sure that the global memstore size 
is under control. And this is obviously heap size check.

Now when we have off heap also in the picture and allow the cell data to be in 
off heap area, there has to be more consideration.  We have to continue with 
the per region level flush decisions. Also we should make sure that there wont 
be OOME from on heap side or off heap side. Right now we have a config to 
specify the off heap barrier for all memstore sizes. The off heap size occupied 
by all memstores. This is needed any way and seems no question on that.  Also 
we will have to have the old on heap barrier check (def to 40% of Xmx).. 
Because for memstores, there will be on heap usage for sure such as the Cell 
POJO overhead , CSLM entries etc etc.  That is why this check is if any of the 
size breach happens, we will do writes blocking and fdo forced flushes.
For region level flushes, right now the 128 MB size check is against data size 
alone. We wont consider the overhead. This raised some Qs and backward 
behaviour consistency issues. So the general agreement we made is for on heap 
cases, we must continue to check for 128 MB limit against the memstore heap 
size. Not just data size.  Also we have agreed that for off heap also, we will 
consider the off heap size + heap overhead.  So the region level decision 
making is consistent for both on heap and off heap cases. Ideally checking the 
data size alone here would have been the best way. I mean for any decision per 
region level.
Also there was a concern raised how we select region(s) for flushing because of 
global size breach. When the size breach is because of off heap size, we have 
to select regions having maximum data size and when breach because of on heap 
size limit, select the regions with more heap overhead.

> Reduce global heap pressure: flush based on heap occupancy
> ----------------------------------------------------------
>
>                 Key: HBASE-18294
>                 URL: https://issues.apache.org/jira/browse/HBASE-18294
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>
>         Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch, 
> HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch, 
> HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, 
> HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, 
> HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, 
> HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, 
> HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch, 
> HBASE-18294.master.01.patch
>
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to