Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Anoop John Mon, 07 Aug 2017 00:23:33 -0700

Sorry for being later to reply.

So u mean we should track both sizes even at Region level?  This was
considered at that time but did not do as that will add more overhead.
We have to deal with 2 AtomicLongs in every Region.  Right now we
handle this double check at RS level only so that added just one more
variable dealing.


-Anoop-

On Mon, Jul 10, 2017 at 7:34 PM, Eshcar Hillel
<[email protected]> wrote:
> Here is a suggestion:We can track both heap and off-heap sizes and have 2 
> thresholds one for limiting heap size and one for limiting off-heap size.And 
> in all decision making junctions we check whether one of the thresholds is 
> exceeded and if it is we trigger a flush. We can choose which entity to flush 
> based on the cause.For example, if we decided to flush since the heap size 
> exceeds the heap threshold than we flush the region/store with greatest heap 
> size. and likewise for off-heap flush.
>
> I can prepare a patch.
>
> This is not rolling back HBASE-18294 simply refining it to have different 
> decision making for the on and off heap cases.
>
> On Monday, July 10, 2017, 8:25:12 AM GMT+3, Anoop John 
> <[email protected]> wrote:
>
> Stack and others..
> We wont do any OOM or FullGC issues.  Because globally at RS level we
> will track both the data size (of all the memstores) and the heap
> size.  The decision there accounts both. In fact in case of normal on
> heap memstores, the accounting is like the old way of heap size based.
>
> At region level (and at Segments level)  we track data size only.  The
> decisions are based on data size.
>
> So in the past region flush size of 128 MB means we will flush when
> the heap size of that region crosses 128 MB.  But now it is data size
> alone.  What I feel is that is more inclined to a normal user
> thinking.  He say flush size of 128 MB and then the thinking can be
> 128 MB of data.
>
> The background of this change is the off heap memstores where we need
> separate tracking of both data and heap overhead sizes.  But at
> region level this behave change was done thinking that is more user
> oriented
>
> I agree with Yu that it is a surprising behave change. Ya if not tuned
> accordingly one might see more blocked writes. Because the per region
> flushes are more delayed now and so chances of reaching the global
> memstore upper barrier chances are more.  And then we will block
> writes and force flushes.  (But off heap memstores will do better job
> here).  But this would NOT cause any OOME or FullGC.
>
> I guess we should have reduced the 128 MB default flush size then?  I
> asked this Q in that jira and then we did not discuss further.
>
> I hope I explained the background and the change and the impacts.  Thanks.
>
> -Anoop-
>
> On Thu, Jul 6, 2017 at 11:43 AM, 宾莉金（binlijin） <[email protected]> wrote:
>> I like to use the former, heap occupancy, so we not need to worry about the
>> OOM and FullGc，and change configuration to adapted to new policy.
>>
>> 2017-07-06 14:03 GMT+08:00 Stack <[email protected]>:
>>
>>> On Wed, Jul 5, 2017 at 9:59 PM, ramkrishna vasudevan <
>>> [email protected]> wrote:
>>>
>>> >
>>> > >>Sounds like we should be doing the former, heap occupancy
>>> > Stack, so do you mean we need to roll back this new change in trunk? The
>>> > background is https://issues.apache.org/jira/browse/HBASE-16747.
>>> >
>>> >
>>> I remember that issue. It seems good to me (as it did then) where we have
>>> the global tracking in RS of all data and overhead so we shouldn't OOME and
>>> we keep accounting of overhead and data distinct because now data can be
>>> onheap or offheap.
>>>
>>> We shouldn't be doing blocking updates -- not when there is probably loads
>>> of memory still available -- but that is a different (critical) issue.
>>> Sounds like current configs can 'surprise' -- see Yu Li note -- given the
>>> new accounting.
>>>
>>> Looks like I need to read HBASE-18294
>>> <https://issues.apache.org/jira/browse/HBASE-18294> to figure what the
>>> pivot/problem w/ the new policy is.....
>>>
>>> Thanks,
>>> St.Ack
>>>
>>>
>>>
>>>
>>>
>>> > Regards
>>> > Ram
>>> >
>>> >
>>> > On Thu, Jul 6, 2017 at 8:40 AM, Yu Li <[email protected]> wrote:
>>> >
>>> > > We've also observed more blocking updates happening with the new policy
>>> > > (flush decision made on data size), but could work-around it by
>>> reducing
>>> > > the hbase.hregion.memstore.flush.size setting. The advantage of
>>> current
>>> > > policy is we could control the flushed file size more accurately, but
>>> > > meanwhile losing some "compatibility" (requires configuration updating
>>> > > during rolling upgrade).
>>> > >
>>> > > I'm not sure whether we should rollback, but if stick on current policy
>>> > > there should be more documents, metrics (monitoring heap/data occupancy
>>> > > separately) and log message refinements, etc. Attaching some of the
>>> logs
>>> > we
>>> > > observed, which is pretty confusing w/o knowing the details of
>>> > > implementation:
>>> > >
>>> > > 2017-07-03 16:11:54,724 INFO
>>> > >  [B.defaultRpcServer.handler=182,queue=11,port=16020]
>>> > > regionserver.MemStoreFlusher: Blocking updates on
>>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>>> > > 2017-07-03 16:11:54,754 INFO
>>> > >  [B.defaultRpcServer.handler=186,queue=15,port=16020]
>>> > > regionserver.MemStoreFlusher: Blocking updates on
>>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>>> > > 2017-07-03 16:11:57,571 INFO  [MemStoreFlusher.0]
>>> > > regionserver.MemStoreFlusher: Flush of region
>>> > > mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e1649
>>> > 2a.
>>> > > due to global heap pressure. Total Memstore size=3.2 G, Region memstore
>>> > > size=331.4 M
>>> > > 2017-07-03 16:11:57,571 WARN
>>> > >  [B.defaultRpcServer.handler=49,queue=11,port=16020]
>>> > > regionserver.MemStoreFlusher: Memstore is above high water mark and
>>> block
>>> > > 2892ms
>>> > >
>>> > > Best Regards,
>>> > > Yu
>>> > >
>>> > > On 6 July 2017 at 00:56, Stack <[email protected]> wrote:
>>> > >
>>> > > > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel
>>> > > <[email protected]
>>> > > > >
>>> > > > wrote:
>>> > > >
>>> > > > > Hi All,
>>> > > > > I opened a new Jira https://issues.apache.org/
>>> > jira/browse/HBASE-18294
>>> > > to
>>> > > > > discuss this question.
>>> > > > > Flush decisions are taken at the region level and also at the
>>> region
>>> > > > > server level - there is the question of when to trigger a flush and
>>> > > then
>>> > > > > which region/store to flush.Regions track both their data size
>>> > > (key-value
>>> > > > > size only) and their total heap occupancy (including index and
>>> > > additional
>>> > > > > metadata).One option (which was the past policy) is to trigger
>>> > flushes
>>> > > > and
>>> > > > > choose flush subjects based on regions heap size - this gives a
>>> > better
>>> > > > > estimation for sysadmin of how many regions can a RS carry.Another
>>> > > option
>>> > > > > (which is the current policy) is to look at the data size - this
>>> > gives
>>> > > a
>>> > > > > better estimation of the size of the files that are created by the
>>> > > flush.
>>> > > > >
>>> > > >
>>> > > >
>>> > > > Sounds like we should be doing the former, heap occupancy. An
>>> > > > OutOfMemoryException puts a nail in any benefit other accountings
>>> might
>>> > > > have.
>>> > > >
>>> > > > St.Ack
>>> > > >
>>> > > >
>>> > > >
>>> > > > > I see this is as critical to HBase performance and usability,
>>> namely
>>> > > > > meeting the user expectation from the system, hence I would like to
>>> > > hear
>>> > > > as
>>> > > > > many voices as possible.Please join the discussion in the Jira and
>>> > let
>>> > > us
>>> > > > > know what you think.
>>> > > > > Thanks,Eshcar
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>>
>> --
>> *Best Regards,*
>>  lijin bin

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Reply via email to