[ https://issues.apache.org/jira/browse/HBASE-25709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317952#comment-17317952 ]
Xiaolin Ha commented on HBASE-25709: ------------------------------------ Hi, [~stack], thanks for reviewing this issue. The StoreScanner is shared by compaction and user scanners, I set default off to make it be compatible with the original logic. I thought carefully about your suggestion to set it default on, there may be some correctness issues. Because for user scanners, matchers return SKIP will make the heap loop in polling cells until heap is empty or the top cell matches the scanner rules. If we set this default on, the method will return if it has get per heart beat cells, though the top cell of the heap may be invalid. Then outer scanners will peek incorrect data(Maybe not, because there are still filters before return the result).Such as in KeyValueHeap#next(List<Cell> result, ScannerContext scannerContext), it just adds the top cell after the StoreScanner#next returns. But in user scanner context, the scanner will return util reach the limit it sets. As a result, returning prematurely for user scanners may be unexpected. > Close region may stuck when region is compacting and skipped most cells read > ---------------------------------------------------------------------------- > > Key: HBASE-25709 > URL: https://issues.apache.org/jira/browse/HBASE-25709 > Project: HBase > Issue Type: Improvement > Components: Compaction > Affects Versions: 1.4.13 > Reporter: Xiaolin Ha > Assignee: Xiaolin Ha > Priority: Major > Attachments: Master-UI-RIT.png, RS-region-state.png > > > We found in our cluster about stop region stuck. The region is compacting, > and its store files has many TTL expired cells. Close region state > marker(HRegion#writestate.writesEnabled) is not checked in compaction, > because most cells were skipped. > !RS-region-state.png|width=698,height=310! > > !Master-UI-RIT.png|width=693,height=157! > > HBASE-23968 has encountered similar problem, but the solution in it is outer > the method > InternalScanner#next(List<Cell> result, ScannerContext scannerContext), which > will not return if there are many skipped cells, for current compaction > scanner context. As a result, we need to return in time in the next method, > and then check the stop marker. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)