[ https://issues.apache.org/jira/browse/HBASE-26026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated HBASE-26026: ----------------------------- Assignee: chenglei Status: Patch Available (was: Open) > HBase Write may be stuck forever when using CompactingMemStore > -------------------------------------------------------------- > > Key: HBASE-26026 > URL: https://issues.apache.org/jira/browse/HBASE-26026 > Project: HBase > Issue Type: Bug > Components: in-memory-compaction > Affects Versions: 2.4.0, 2.3.0 > Reporter: chenglei > Assignee: chenglei > Priority: Major > > Sometimes I observed that HBase Write might be stuck in my hbase cluster > which enabling {{CompactingMemStore}}. I have simulated the problem by unit > test in my PR. > The problem is caused by {{CompactingMemStore.checkAndAddToActiveSize}} : > {code:java} > 425 private boolean checkAndAddToActiveSize(MutableSegment currActive, Cell > cellToAdd, > 426 MemStoreSizing memstoreSizing) { > 427 if (shouldFlushInMemory(currActive, cellToAdd, memstoreSizing)) { > 428 if (currActive.setInMemoryFlushed()) { > 429 flushInMemory(currActive); > 430 if (setInMemoryCompactionFlag()) { > 431 // The thread is dispatched to do in-memory compaction in the > background > ...... > } > {code} > In line 427, if {{currActive.getDataSize}} adding the size of {{cellToAdd}} > exceeds {{CompactingMemStore.inmemoryFlushSize}}, then {{currActive}} should > be flushed, {{MutableSegment.setInMemoryFlushed()}} is invoked in above line > 428 : > {code:java} > public boolean setInMemoryFlushed() { > return flushed.compareAndSet(false, true); > } > {code} > After set {{currActive.flushed}} to true, in above line 429 > {{flushInMemory(currActive)}} invokes > {{CompactingMemStore.pushActiveToPipeline}} : > {code:java} > protected void pushActiveToPipeline(MutableSegment currActive) { > if (!currActive.isEmpty()) { > pipeline.pushHead(currActive); > resetActive(); > } > } > {code} > In above {{CompactingMemStore.pushActiveToPipeline}} method , if the > {{currActive.cellSet}} is empty, then nothing is done. Due to concurrent > writes and because we first add cell size to {{currActive.getDataSize}} and > then actually add cell to {{currActive.cellSet}}, it is possible that > {{currActive.getDataSize}} could not accommodate {{cellToAdd}} but > {{currActive.cellSet}} is still empty if pending writes which not yet add > cells to {{currActive.cellSet}}. > So if the {{currActive.cellSet}} is empty now, then no {{ActiveSegment}} is > created, and new writes still continue target to {{currActive}}, but > {{currActive.flushed}} is true, {{currActive}} could not enter > {{flushInMemory(currActive)}} again,and new {{ActiveSegment}} could not be > created forever ! In the end all writes would be stuck. > In my opinion , once {{currActive.flushed}} is set true, it could not > continue use as {{ActiveSegment}} , and because of concurrent pending writes, > only after {{currActive.updatesLock.writeLock()}} is acquired(i.e. > {{currActive.waitForUpdates}} is called) in > {{CompactingMemStore.inMemoryCompaction}} ,we can safely say {{currActive}} > is empty or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)