Eshcar Hillel commented on HBASE-18294:

{quote} the general agreement we made is for on heap cases, we must continue to 
check for 128 MB limit against the memstore heap size. Not just data size. Also 
we have agreed that for off heap also, we will consider the off heap size + 
heap overhead.
>From the beginning I aimed to have as symmetric behavior as possible of 
>on-heap and off-heap cases, so I don't believe I agreed on having two 
>different computations. One way to make it symmetric is to compare the two 
>counters against two thresholds. Another way to unify it is to always consider 
>the sum of off-heap and on-heap sizes at the region level. We still need to 
>manage two separate counters since the global bounds are different.
bq.  Ideally checking the data size alone here would have been the best way. I 
mean for any decision per region level.
You keep saying that but it seems to be based more on intuition rather than on 
experiments. While considering both data and heap overhead for region level 
flush have shown to improve the performance significantly.
bq.When the size breach is because of off heap size, we have to select regions 
having maximum data size and when breach because of on heap size limit, select 
the regions with more heap overhead.
Again, Why? you say we should have different decision making but you don't 
explain why, and don't have numbers to support your claims.
I argue that unless shown there is a great performance benefit in making 
different rules, on-heap and off heap should follow the same set of rules, 
embedding them with their respective bounds.

So, I will make a new patch, leave only one flush size configuration property 
(remove off-heap flush size), flush size at the region level will always 
consider on-heap+off-heap size. The rest will be similar to the current patch.
Patch will be ready in a few days.

> Reduce global heap pressure: flush based on heap occupancy
> ----------------------------------------------------------
>                 Key: HBASE-18294
>                 URL: https://issues.apache.org/jira/browse/HBASE-18294
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>         Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch, 
> HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch, 
> HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, 
> HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, 
> HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, 
> HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, 
> HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch, 
> HBASE-18294.master.01.patch
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).

This message was sent by Atlassian JIRA

Reply via email to