Anoop Sam John commented on HBASE-18294:

bq.You keep saying that but it seems to be based more on intuition rather than 
on experiments. While considering both data and heap overhead for region level 
flush have shown to improve the performance significantly.
This said considering as a normal user. If I say 100 MB flush size, I would 
expect the flush will happen when there is a 100 MB data accumulation. Not on 
perf aspect. I never said when considering data size alone perf will be more. 
Again on perf why it is less when the decision making is based on data size 
alone - It is because of the more global data size breaches. That is like a 
blocking situation and forced flushes with all writes blocked. This can be well 
tuned. Either increase the global size available for all memstores or reduce 
the per region flush. All these points we discussed earlier in the old jiras. 
But I agree on existing clusters this can be a trouble. That is why we agreed 
to go back to old way.  So the above statement is not wrt perf.
bq.Again, Why? you say we should have different decision making but you don't 
explain why, and don't have numbers to support your claims.
This you only said in one older comments. I forgot where exactly.  That right 
now, when there is a breach because of data size or heap size, we tend to 
select region with max data size only. Ideally when heap size (overhead) based 
breach is there, better to select region having max heap occupancy (overhead).. 
 I agreed to your point at that time itself. Now seems you no longer agree to 
bq.So, I will make a new patch, leave only one flush size configuration 
property (remove off-heap flush size), flush size at the region level will 
always consider on-heap+off-heap size
Ya we agreed to this in a general level. And in old patches again the off heap 
based region flush config came in. That is why I asked immediately then.

> Reduce global heap pressure: flush based on heap occupancy
> ----------------------------------------------------------
>                 Key: HBASE-18294
>                 URL: https://issues.apache.org/jira/browse/HBASE-18294
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>         Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch, 
> HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.01.patch, 
> HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch, 
> HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, 
> HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, 
> HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, 
> HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, 
> HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch, 
> HBASE-18294.master.01.patch, HBASE-18294.master.01.patch
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).

This message was sent by Atlassian JIRA

Reply via email to