I like to use the former, heap occupancy, so we not need to worry about the OOM and FullGc,and change configuration to adapted to new policy.
2017-07-06 14:03 GMT+08:00 Stack <[email protected]>: > On Wed, Jul 5, 2017 at 9:59 PM, ramkrishna vasudevan < > [email protected]> wrote: > > > > > >>Sounds like we should be doing the former, heap occupancy > > Stack, so do you mean we need to roll back this new change in trunk? The > > background is https://issues.apache.org/jira/browse/HBASE-16747. > > > > > I remember that issue. It seems good to me (as it did then) where we have > the global tracking in RS of all data and overhead so we shouldn't OOME and > we keep accounting of overhead and data distinct because now data can be > onheap or offheap. > > We shouldn't be doing blocking updates -- not when there is probably loads > of memory still available -- but that is a different (critical) issue. > Sounds like current configs can 'surprise' -- see Yu Li note -- given the > new accounting. > > Looks like I need to read HBASE-18294 > <https://issues.apache.org/jira/browse/HBASE-18294> to figure what the > pivot/problem w/ the new policy is..... > > Thanks, > St.Ack > > > > > > > Regards > > Ram > > > > > > On Thu, Jul 6, 2017 at 8:40 AM, Yu Li <[email protected]> wrote: > > > > > We've also observed more blocking updates happening with the new policy > > > (flush decision made on data size), but could work-around it by > reducing > > > the hbase.hregion.memstore.flush.size setting. The advantage of > current > > > policy is we could control the flushed file size more accurately, but > > > meanwhile losing some "compatibility" (requires configuration updating > > > during rolling upgrade). > > > > > > I'm not sure whether we should rollback, but if stick on current policy > > > there should be more documents, metrics (monitoring heap/data occupancy > > > separately) and log message refinements, etc. Attaching some of the > logs > > we > > > observed, which is pretty confusing w/o knowing the details of > > > implementation: > > > > > > 2017-07-03 16:11:54,724 INFO > > > [B.defaultRpcServer.handler=182,queue=11,port=16020] > > > regionserver.MemStoreFlusher: Blocking updates on > > > hadoop0528.et2.tbsite.net,16020,1497336978160: > > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size > > > 2017-07-03 16:11:54,754 INFO > > > [B.defaultRpcServer.handler=186,queue=15,port=16020] > > > regionserver.MemStoreFlusher: Blocking updates on > > > hadoop0528.et2.tbsite.net,16020,1497336978160: > > > global memstore heapsize 7.2 G is >= than blocking 7.2 G size > > > 2017-07-03 16:11:57,571 INFO [MemStoreFlusher.0] > > > regionserver.MemStoreFlusher: Flush of region > > > mainv7_main_result_c,1496,1499062935573.02adfa7cbdc606dce5b79a516e1649 > > 2a. > > > due to global heap pressure. Total Memstore size=3.2 G, Region memstore > > > size=331.4 M > > > 2017-07-03 16:11:57,571 WARN > > > [B.defaultRpcServer.handler=49,queue=11,port=16020] > > > regionserver.MemStoreFlusher: Memstore is above high water mark and > block > > > 2892ms > > > > > > Best Regards, > > > Yu > > > > > > On 6 July 2017 at 00:56, Stack <[email protected]> wrote: > > > > > > > On Wed, Jul 5, 2017 at 6:30 AM, Eshcar Hillel > > > <[email protected] > > > > > > > > > wrote: > > > > > > > > > Hi All, > > > > > I opened a new Jira https://issues.apache.org/ > > jira/browse/HBASE-18294 > > > to > > > > > discuss this question. > > > > > Flush decisions are taken at the region level and also at the > region > > > > > server level - there is the question of when to trigger a flush and > > > then > > > > > which region/store to flush.Regions track both their data size > > > (key-value > > > > > size only) and their total heap occupancy (including index and > > > additional > > > > > metadata).One option (which was the past policy) is to trigger > > flushes > > > > and > > > > > choose flush subjects based on regions heap size - this gives a > > better > > > > > estimation for sysadmin of how many regions can a RS carry.Another > > > option > > > > > (which is the current policy) is to look at the data size - this > > gives > > > a > > > > > better estimation of the size of the files that are created by the > > > flush. > > > > > > > > > > > > > > > > > Sounds like we should be doing the former, heap occupancy. An > > > > OutOfMemoryException puts a nail in any benefit other accountings > might > > > > have. > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > I see this is as critical to HBase performance and usability, > namely > > > > > meeting the user expectation from the system, hence I would like to > > > hear > > > > as > > > > > many voices as possible.Please join the discussion in the Jira and > > let > > > us > > > > > know what you think. > > > > > Thanks,Eshcar > > > > > > > > > > > > > > > > > > > > -- *Best Regards,* lijin bin
