[ https://issues.apache.org/jira/browse/HBASE-26026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated HBASE-26026: ----------------------------- Description: Sometimes I observed that HBase Write might be stuck in my hbase cluster which enabling {{CompactingMemStore}}. I have simulated the problem by unit test in my PR. The problem is caused by {{CompactingMemStore.checkAndAddToActiveSize}} : {code:java} 425 private boolean checkAndAddToActiveSize(MutableSegment currActive, Cell cellToAdd, 426 MemStoreSizing memstoreSizing) { 427 if (shouldFlushInMemory(currActive, cellToAdd, memstoreSizing)) { 428 if (currActive.setInMemoryFlushed()) { 429 flushInMemory(currActive); 430 if (setInMemoryCompactionFlag()) { 431 // The thread is dispatched to do in-memory compaction in the background ...... } {code} In line 427, if the sum of {{currActive.getDataSize}} adding the size of {{cellToAdd}} exceeds {{CompactingMemStore.inmemoryFlushSize}}, then {{currActive}} should be flushed, {{MutableSegment.setInMemoryFlushed()}} is invoked in above line 428 : {code:java} public boolean setInMemoryFlushed() { return flushed.compareAndSet(false, true); } {code} After set {{currActive.flushed}} to true, in above line 429 {{flushInMemory(currActive)}} invokes {{CompactingMemStore.pushActiveToPipeline}} furthermore: {code:java} protected void pushActiveToPipeline(MutableSegment currActive) { if (!currActive.isEmpty()) { pipeline.pushHead(currActive); resetActive(); } } {code} For above {{CompactingMemStore.pushActiveToPipeline}} , if the {{currActive.cellSet}} is empty, then nothing is done. But due to concurrent write and because we first add cell size to {{currActive.getDataSize}} and then actually add cell to {{currActive.cellSet}}, it is possible that {{currActive.getDataSize}} could not accommodate more cell but {{currActive.cellSet}} is empty because pending writes which not yet add cells to {{currActive.cellSet}}. So now, {{currActive.flushed}} is true,and new writes still continue target to {{currActive}}, but {{currActive}} could not enter {{flushInMemory}} again,no new active segment could be created, and in the end all writes would be stuck. In my opinion , once {{currActive.flushed}} is set true, it could not use as {{ActiveSegment}} again, and because of concurrent pending writes, only after {{currActive.updatesLock.writeLock()}} is acquired in {{CompactingMemStore.inMemoryCompaction}} ,we can safely check {{currActive}} is empty or not. was: Sometimes I observed that HBase Write might be stuck in my hbase cluster which enabling {{CompactingMemStore}}. I have simulated the problem by unit test in my PR. The problem is caused by {{CompactingMemStore.checkAndAddToActiveSize}} : {code:java} 425 private boolean checkAndAddToActiveSize(MutableSegment currActive, Cell cellToAdd, 426 MemStoreSizing memstoreSizing) { 427 if (shouldFlushInMemory(currActive, cellToAdd, memstoreSizing)) { 428 if (currActive.setInMemoryFlushed()) { 429 flushInMemory(currActive); 430 if (setInMemoryCompactionFlag()) { 431 // The thread is dispatched to do in-memory compaction in the background ...... } {code} In line 427, if the sum of {{currActive.getDataSize}} adding the size of {{cellToAdd}} exceeds {{CompactingMemStore.inmemoryFlushSize}}, then {{currActive}} should be flushed, {{MutableSegment.setInMemoryFlushed()}} is invoked in above line 428 : {code:java} public boolean setInMemoryFlushed() { return flushed.compareAndSet(false, true); } {code} for above line 429 {{currActive.flushed}} is true, and {{CompactingMemStore.flushInMemory}} invokes {{CompactingMemStore.pushActiveToPipeline}} furthermore: {code:java} protected void pushActiveToPipeline(MutableSegment currActive) { if (!currActive.isEmpty()) { pipeline.pushHead(currActive); resetActive(); } } {code} For above {{CompactingMemStore.pushActiveToPipeline}} , if the {{currActive.cellSet}} is empty, then nothing is done. But due to concurrent write and because we add cell size to {{currActive.getDataSize}} and then add cell to {{currActive.cellSet}}, it is possible that {{currActive.getDataSize}} could not accommodate more cell but {{currActive.cellSet}} is empty because pending writes which not yet add cells to {{currActive.cellSet}}. So now, {{currActive.flushed}} is true,and new writes still continue target to {{currActive}}, but {{currActive}} could not enter {{flushInMemory}} again,no new active segment could be created, and in the end all writes would be stuck. In my opinion , once {{currActive.flushed}} is set true, it could not use as {{ActiveSegment}} again, and because of concurrent pending writes, only after {{currActive.updatesLock.writeLock()}} is acquired in {{CompactingMemStore.inMemoryCompaction}} ,we can safely check {{currActive}} is empty or not. > HBase Write may be stuck forever when using CompactingMemStore > -------------------------------------------------------------- > > Key: HBASE-26026 > URL: https://issues.apache.org/jira/browse/HBASE-26026 > Project: HBase > Issue Type: Bug > Components: in-memory-compaction > Affects Versions: 2.3.0, 2.4.0 > Reporter: chenglei > Priority: Major > > Sometimes I observed that HBase Write might be stuck in my hbase cluster > which enabling {{CompactingMemStore}}. I have simulated the problem by unit > test in my PR. > The problem is caused by {{CompactingMemStore.checkAndAddToActiveSize}} : > {code:java} > 425 private boolean checkAndAddToActiveSize(MutableSegment currActive, Cell > cellToAdd, > 426 MemStoreSizing memstoreSizing) { > 427 if (shouldFlushInMemory(currActive, cellToAdd, memstoreSizing)) { > 428 if (currActive.setInMemoryFlushed()) { > 429 flushInMemory(currActive); > 430 if (setInMemoryCompactionFlag()) { > 431 // The thread is dispatched to do in-memory compaction in the > background > ...... > } > {code} > In line 427, if the sum of {{currActive.getDataSize}} adding the size of > {{cellToAdd}} exceeds {{CompactingMemStore.inmemoryFlushSize}}, then > {{currActive}} should be flushed, {{MutableSegment.setInMemoryFlushed()}} is > invoked in above line 428 : > {code:java} > public boolean setInMemoryFlushed() { > return flushed.compareAndSet(false, true); > } > {code} > After set {{currActive.flushed}} to true, in above line 429 > {{flushInMemory(currActive)}} invokes > {{CompactingMemStore.pushActiveToPipeline}} furthermore: > {code:java} > protected void pushActiveToPipeline(MutableSegment currActive) { > if (!currActive.isEmpty()) { > pipeline.pushHead(currActive); > resetActive(); > } > } > {code} > For above {{CompactingMemStore.pushActiveToPipeline}} , if the > {{currActive.cellSet}} is empty, then nothing is done. But due to concurrent > write and because we first add cell size to {{currActive.getDataSize}} and > then actually add cell to {{currActive.cellSet}}, it is possible that > {{currActive.getDataSize}} could not accommodate more cell but > {{currActive.cellSet}} is empty because pending writes which not yet add > cells to {{currActive.cellSet}}. > So now, {{currActive.flushed}} is true,and new writes still continue target > to {{currActive}}, but {{currActive}} could not enter {{flushInMemory}} > again,no new active segment could be created, and in the end all writes would > be stuck. > In my opinion , once {{currActive.flushed}} is set true, it could not use as > {{ActiveSegment}} again, and because of concurrent pending writes, only after > {{currActive.updatesLock.writeLock()}} is acquired in > {{CompactingMemStore.inMemoryCompaction}} ,we can safely check > {{currActive}} is empty or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)