Anoop Sam John created HBASE-16747:
--------------------------------------

             Summary: Track memstore data size and heap overhead separately 
                 Key: HBASE-16747
                 URL: https://issues.apache.org/jira/browse/HBASE-16747
             Project: HBase
          Issue Type: Sub-task
            Reporter: Anoop Sam John
            Assignee: Anoop Sam John
             Fix For: 2.0.0


We track the memstore size in 3 places.
1. Global at RS level in RegionServerAccounting. This tracks all memstore's 
size and used to calculate whether forced flushes needed because of global heap 
pressure
2. At region level in HRegion. This is sum of sizes of all memstores within 
this region. This is used to decide whether region reaches flush size (128 MB)
3. Segment level. This tracks the in memory flush/compaction decisions.

All these use the Cell's heap size which include the data bytes# as well as 
Cell object heap overhead.  Also we include the overhead because of addition of 
Cells into Segment's data structures (Like CSLM).

Once we have off heap memstore, we will keep the cell data bytes in off heap 
area. So we can not track both data size and heap overhead as one entity. We 
need to separate them and track.

Proposal here is to track both cell data size and heap overhead separately at 
global accounting layer.  As of now we have only on heap memstore. So the 
global memstore boundary checks will consider both (adds up and check against 
global max memstore size)
Track cell data size alone (This can be on heap or off heap) in region level.  
Region flushes use cell data size alone for the region flush decision. A user 
configuring 128 MB as flush size, normally he will expect to get a 128MB data 
flush size. But as we were including the heap overhead also, once the flush 
happens, the actual data size getting flushed is way behind this 128 MB.  Now 
with this change we will behave more like what a user thinks.
Segment level in memory flush/compaction also considers cell data size alone.  
But we will need to track the heap overhead also. (Once the in memory flush or 
normal flush happens, we will have to adjust both cell data size and heap 
overhead)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to