[
https://issues.apache.org/jira/browse/HBASE-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285967#comment-15285967
]
Enis Soztutar commented on HBASE-15837:
---------------------------------------
[~elserj] I was also looking at this to understand what may have happened. The
memstore size discrepancy happens when the index update fails for Phoenix. I
also have a patch that should fix the root cause.
It is something like this:
- HRegion.memstoreSize gets updated only after a batch is finished.
HStore.getMemstoreSize() gets updated everytime we add a new cell. If the
transaction rolls back from the memstore, we also decrement the size in
HStore.
- HRegion.batchMutate() has the code:
{code}
long addedSize = doMiniBatchMutation(batchOp);
long newSize = this.addAndGetGlobalMemstoreSize(addedSize);
{code}
Which means that the HRegion.memstoreSize will get update ONLY when
doMiniBatchMutation DOES NOT throw an exception.
- In most cases when doMiniBatchMutation() throws an exception, we sometimes
undo the memstore operations by rolling back. This ensures that updates are
removed and hstore size is decremented back.
- However, in the case that postBatchMutate() throws exception, the WAL is
already sync()'ed, so we cannot rollback the transaction. We actually have the
edits in the memstore and WAL, but fail to update the region memstore size.
HStore size is correctly updated, thus leaving the discrepancy.
- Phoenix secondary indexing does the scans in the preBatchMutate() call and
does the secondary index writes in the postBatchMutate() call. Thus, if the
secondary index writes fail, we end up with the accounting error, and
subsequent aborting of regionservers in region close.
> More gracefully handle a negative memstoreSize
> ----------------------------------------------
>
> Key: HBASE-15837
> URL: https://issues.apache.org/jira/browse/HBASE-15837
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Josh Elser
> Assignee: Josh Elser
> Fix For: 2.0.0
>
> Attachments: HBASE-15837.001.patch
>
>
> Over in PHOENIX-2883, I've been trying to figure out how to track down the
> root cause of an issue we were seeing where a negative memstoreSize was
> ultimately causing an RS to abort. The tl;dr version is
> * Something causes memstoreSize to be negative (not sure what is doing this
> yet)
> * All subsequent flushes short-circuit and don't run because they think there
> is no data to flush
> * The region is eventually closed (commonly, for a move).
> * A final flush is attempted on each store before closing (which also
> short-circuit for the same reason), leaving unflushed data in each store.
> * The sanity check that each store's size is zero fails and the RS aborts.
> I have a little patch which I think should improve our failure case around
> this, preventing the RS abort safely (forcing a flush when memstoreSize is
> negative) and logging a calltrace when an update to memstoreSize make it
> negative (to find culprits in the future).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)