Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Anoop John Sun, 09 Jul 2017 22:25:18 -0700

Stack and others..
We wont do any OOM or FullGC issues.   Because globally at RS level we
will track both the data size (of all the memstores) and the heap
size.  The decision there accounts both. In fact in case of normal on
heap memstores, the accounting is like the old way of heap size based.


At region level (and at Segments level)  we track data size only.  The
decisions are based on data size.

So in the past region flush size of 128 MB means we will flush when
the heap size of that region crosses 128 MB.   But now it is data size
alone.   What I feel is that is more inclined to a normal user
thinking.  He say flush size of 128 MB and then the thinking can be
128 MB of data.

The background of this change is the off heap memstores where we need
separate tracking of both data and heap overhead sizes.   But at
region level this behave change was done thinking that is more user
oriented

I agree with Yu that it is a surprising behave change. Ya if not tuned
accordingly one might see more blocked writes. Because the per region
flushes are more delayed now and so chances of reaching the global
memstore upper barrier chances are more.  And then we will block
writes and force flushes.   (But off heap memstores will do better job
here).  But this would NOT cause any OOME or FullGC.

I guess we should have reduced the 128 MB default flush size then?  I
asked this Q in that jira and then we did not discuss further.

I hope I explained the background and the change and the impacts.  Thanks.

-Anoop-

On Thu, Jul 6, 2017 at 11:43 AM, 宾莉金（binlijin） <[email protected]> wrote:
> I like to use the former, heap occupancy, so we not need to worry about the
> OOM and FullGc，and change configuration to adapted to new policy.
>
> 2017-07-06 14:03 GMT+08:00 Stack <[email protected]>:
>
>> On Wed, Jul 5, 2017 at 9:59 PM, ramkrishna vasudevan <
>> [email protected]> wrote:
>>
>> >
>> > >>Sounds like we should be doing the former, heap occupancy
>> > Stack, so do you mean we need to roll back this new change in trunk? The
>> > background is https://issues.apache.org/jira/browse/HBASE-16747.
>> >
>> >
>> I remember that issue. It seems good to me (as it did then) where we have
>> the global tracking in RS of all data and overhead so we shouldn't OOME and
>> we keep accounting of overhead and data distinct because now data can be
>> onheap or offheap.
>>
>> We shouldn't be doing blocking updates -- not when there is probably loads
>> of memory still available -- but that is a different (critical) issue.
>> Sounds like current configs can 'surprise' -- see Yu Li note -- given the
>> new accounting.
>>
>> Looks like I need to read HBASE-18294
>> <https://issues.apache.org/jira/browse/HBASE-18294> to figure what the
>> pivot/problem w/ the new policy is.....
>>
>> Thanks,
>> St.Ack
>>
>>
>>
>>
>>
>> > Regards
>> > Ram
>> >
>> >
>> > On Thu, Jul 6, 2017 at 8:40 AM, Yu Li <[email protected]> wrote:
>> >
>> > > We've also observed more blocking updates happening with the new policy
>> > > (flush decision made on data size), but could work-around it by
>> reducing
>> > > the hbase.hregion.memstore.flush.size setting. The advantage of
>> current
>> > > policy is we could control the flushed file size more accurately, but
>> > > meanwhile losing some "compatibility" (requires configuration updating
>> > > during rolling upgrade).
>> > >
>> > > I'm not sure whether we should rollback, but if stick on current policy
>> > > there should be more documents, metrics (monitoring heap/data occupancy
>> > > separately) and log message refinements, etc. Attaching some of the
>> logs
>> > we
>> > > observed, which is pretty confusing w/o knowing the details of
>> > > implementation:
>> > >
>> > > 2017-07-03 16:11:54,724 INFO
>> > >  [B.defaultRpcServer.handler=182,queue=11,port=16020]
>> > > regionserver.MemStoreFlusher: Blocking updates on
>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>> > > 2017-07-03 16:11:54,754 INFO
>> > >  [B.defaultRpcServer.handler=186,queue=15,port=16020]
>> > > regionserver.MemStoreFlusher: Blocking updates on
>> > > hadoop0528.et2.tbsite.net,16020,1497336978160:
>> > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size
>> > > 2017-07-03 16:11:57,571 INFO  [MemStoreFlusher.0]
>> > > regionserver.MemStoreFlusher: Flush of region
>> > > mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e1649
>> > 2a.
>> > > due to global heap pressure. Total Memstore size=3.2 G, Region memstore
>> > > size=331.4 M
>> > > 2017-07-03 16:11:57,571 WARN
>> > >  [B.defaultRpcServer.handler=49,queue=11,port=16020]
>> > > regionserver.MemStoreFlusher: Memstore is above high water mark and
>> block
>> > > 2892ms
>> > >
>> > > Best Regards,
>> > > Yu
>> > >
>> > > On 6 July 2017 at 00:56, Stack <[email protected]> wrote:
>> > >
>> > > > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel
>> > > <[email protected]
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > Hi All,
>> > > > > I opened a new Jira https://issues.apache.org/
>> > jira/browse/HBASE-18294
>> > > to
>> > > > > discuss this question.
>> > > > > Flush decisions are taken at the region level and also at the
>> region
>> > > > > server level - there is the question of when to trigger a flush and
>> > > then
>> > > > > which region/store to flush.Regions track both their data size
>> > > (key-value
>> > > > > size only) and their total heap occupancy (including index and
>> > > additional
>> > > > > metadata).One option (which was the past policy) is to trigger
>> > flushes
>> > > > and
>> > > > > choose flush subjects based on regions heap size - this gives a
>> > better
>> > > > > estimation for sysadmin of how many regions can a RS carry.Another
>> > > option
>> > > > > (which is the current policy) is to look at the data size - this
>> > gives
>> > > a
>> > > > > better estimation of the size of the files that are created by the
>> > > flush.
>> > > > >
>> > > >
>> > > >
>> > > > Sounds like we should be doing the former, heap occupancy. An
>> > > > OutOfMemoryException puts a nail in any benefit other accountings
>> might
>> > > > have.
>> > > >
>> > > > St.Ack
>> > > >
>> > > >
>> > > >
>> > > > > I see this is as critical to HBase performance and usability,
>> namely
>> > > > > meeting the user expectation from the system, hence I would like to
>> > > hear
>> > > > as
>> > > > > many voices as possible.Please join the discussion in the Jira and
>> > let
>> > > us
>> > > > > know what you think.
>> > > > > Thanks,Eshcar
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
>
> --
> *Best Regards,*
>  lijin bin

Re: [DISCUSS] Should flush decisions be made based on data size (key-value only) or based on heap size (including metadata overhead)?

Reply via email to