[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071594#comment-16071594
 ] 

Eshcar Hillel commented on HBASE-18294:
---------------------------------------

Considering heap size for triggering global heap pressure reduction is fine, 
however, we expect the system to get to this point rarely.
As you mention in your document disregarding heap size can cause blocking, 
excessive GC and even OOME.
Following HBASE-16747 there are 3 cases where considering data size and not 
heap size
(1) when choosing biggest region to flush following global heap pressure
(2) when choosing large stores to flush within the region
(3) when deciding to trigger  region level flush (exceeding the 128MB 
threshold).

Lets consider the on-heap scenario first:
For (1)+(2) it is already decided that there is a flush and now we are looking 
to choose biggest memory consumers (biggest region/largest store). Therefore we 
should compare heap size. Otherwise, we may choose a smaller region/store 
simply because its data is bigger but it may actually take less heap space.  
For (3) the configurable threshold gives the admin a way to compute how many 
regions the RS can handle with a given memory. That was documented in blogs 
etc. Now with the change we actually have no way to predict what is the maximum 
memory size of a region. This may come as a surprise to some admins.

bq. For the off heap based memstores also the data size alone based region 
flush is better.
bq. Offheap memstores need this change so that we are able to decide the region 
flush based on data size alone as the whole data is offheap.
Then lets make a different policy for off-heap cases. Enforcing the same policy 
for on/off-heap doesn't seem to be optimal for both.

I suggest to make different policies for on-heap data (to consider heap size in 
all the cases above), and off-heap that considers heap/data size any way you 
think is best for off-heap. 

> Flush is based on data size instead of heap size
> ------------------------------------------------
>
>                 Key: HBASE-18294
>                 URL: https://issues.apache.org/jira/browse/HBASE-18294
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to