John Doe created HBASE-30213:
--------------------------------
Summary: Spurious RegionTooBusyException due to non-atomic update
of correlated fields heapSize/offHeapSize in ThreadSafeMemStoreSizing
Key: HBASE-30213
URL: https://issues.apache.org/jira/browse/HBASE-30213
Project: HBase
Issue Type: Bug
Components: regionserver
Reporter: John Doe
A multi-variable concurrency bug in ThreadSafeMemStoreSizing can cause write
threads to observe a transiently inflated MemStore size and throw a spurious
RegionTooBusyException immediately after a flush
completes.ThreadSafeMemStoreSizing maintains two semantically correlated
AtomicLong fields, heapSize and offHeapSize, whose sum is used by
HRegion.checkResources() to determine whether incoming writes should be
rejected. During decrMemStoreSize(), these two fields are decremented by two
separate addAndGet() calls with no common lock: offHeapSize is decremented
first (line 59), heapSize second (line 60).
A concurrent write RPC thread calling getMemStoreSize() reads heapSize first
and offHeapSize second (line 53). If the read falls between the two decrements,
it observes the stale pre-flush heapSize combined with the already-decremented
offHeapSize, producing a sum that overestimates the true MemStore size by the
full heapSizeDelta of the flush.
If this inflated sum exceeds blockingMemStoreSize, checkResources() incorrectly
throws RegionTooBusyException (HRegion.java:5029), even though the MemStore is
already safely below the threshold.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)