Josh Elser created HBASE-15837:
----------------------------------
Summary: More gracefully handle a negative memstoreSize
Key: HBASE-15837
URL: https://issues.apache.org/jira/browse/HBASE-15837
Project: HBase
Issue Type: Improvement
Components: regionserver
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 2.0.0
Over in PHOENIX-2883, I've been trying to figure out how to track down the root
cause of an issue we were seeing where a negative memstoreSize was ultimately
causing an RS to abort. The tl;dr version is
* Something causes memstoreSize to be negative (not sure what is doing this yet)
* All subsequent flushes short-circuit and don't run because they think there
is no data to flush
* The region is eventually closed (commonly, for a move).
* A final flush is attempted on each store before closing (which also
short-circuit for the same reason), leaving unflushed data in each store.
* The sanity check that each store's size is zero fails and the RS aborts.
I have a little patch which I think should improve our failure case around
this, preventing the RS abort safely (forcing a flush when memstoreSize is
negative) and logging a calltrace when an update to memstoreSize make it
negative (to find culprits in the future).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)