[ 
https://issues.apache.org/jira/browse/HBASE-26938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520728#comment-17520728
 ] 

Andrew Kyle Purtell edited comment on HBASE-26938 at 4/11/22 9:28 PM:
----------------------------------------------------------------------

PR is ready:

One {{Compactor}} instance is reused for the lifetime of a store, and it has a 
{{writer}} field that at issue here. 

More than one compaction cannot be concurrently selected and executed against a 
given store or else readers or writers of the {{writer}} field will encounter 
multithreaded correctness problems. Yet I am seeing concurrent selection and 
execution of compaction activity against the store in the test scenario.

In the test scenario I have increased the size of the small and large 
compaction thread pools, to 10 and 5 threads, respectively, and increased the 
default point for blocking files to 24, and in the scenario the store is 
flushing furiously. Operation under these conditions used to be reliable, but 
perhaps only by an accidental serialization of compaction activity prior to the 
SFT changes. 

With this change in place the reliability and performance under the test 
scenario returns to previous baseline for DEFAULT SFT. No ERRORs. 


was (Author: apurtell):
Deploying a test to check another suspicion about the concurrency issue here. 

One Compactor instance is reused for the lifetime of a store, and it has this 
'writer' field which cannot be overwritten once set, so more than one 
compaction cannot be concurrently selected and executed against a given store. 
Yet I am seeing concurrent selection and execution of compaction activity 
against the store in my test scenario. I have increased the size of the small 
and large compaction thread pools and increased the default for blocking files, 
and in the scenario the store is flushing furiously. This used to be safe, and 
operation under these conditions was reliable, but perhaps only by an 
accidental serialization of compaction activity prior to the SFT changes. 

> Compaction failures after StoreFileTracker integration (branch-2, branch-2.5)
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-26938
>                 URL: https://issues.apache.org/jira/browse/HBASE-26938
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.5.0, 2.6.0
>            Reporter: Andrew Kyle Purtell
>            Priority: Blocker
>             Fix For: 2.5.0
>
>
> [ Currently this has only been tested with branch-2.5 and branch-2. Testing 
> with master next, will update afterward. ]
> Test cluster of 10 regionservers is configured each RS with 5 flush threads, 
> 5 large compaction threads, and 10 small compaction threads. 
> Hadoop is 3.3.2. Java is 11. HFiles are on HDFS. 
> All the StoreFileTracker implementations, DEFAULT or FILE, exhibit compaction 
> time store writer errors in an ingest heavy use case. Unit tests don't seem 
> to cover whatever this is. Most compactions succeed, but some do not. Those 
> that do not are failing with state or sanity check assertions. Below errors 
> are all from DEFAULT. They seem related... store writer instance 
> usage/close/locking issues during compactions.
> Warnings like "writer exists when it should not":
> {noformat}
> 2022-04-07T23:13:11,351 WARN  
> [regionserver/ip-172-31-63-83:8120-shortCompactions-8]
> compactions.Compactor: Writer exists when it should not: {
>   
> hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/b518f72941d4427e7e1923407643df67/.tmp/c/29d7b88c4c214ddcbba4f747514a2cf5
>  }
> {noformat}
> Errors like:
> IllegalStateException thrown from 
> HFileBlockIndex$BlockIndexWriter.shouldWriteBlock:
> {noformat}
> 2022-04-07T23:13:11,508 ERROR 
> [regionserver/ip-172-31-63-83:8120-shortCompactions-6] 
> regionserver.CompactSplit: Compaction failed 
> region=IntegrationTestLoadCommonCrawl,,1649373172576.b518f72941d4427e7e1923407643df67.,
>  storeName=b518f72941d4427e7e1923407643df67/c, priority=10, 
> startTime=1649373185476
> java.lang.IllegalStateException: curInlineChunk is null; has shouldWriteBlock 
> been called with closing=true and then called again?
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.shouldWriteBlock(HFileBlockIndex.java:1258)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.writeInlineBlocks(HFileWriterImpl.java:523)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:608)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:377)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.abortWriter(DefaultCompactor.java:84)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.abortWriter(DefaultCompactor.java:76)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:384)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> {noformat}
> and IllegalStateException thrown from HFileBlock$Writer.expectState:
> {noformat}
> 2022-04-07T23:13:11,559 ERROR 
> [regionserver/ip-172-31-63-83:8120-shortCompactions-8] 
> regionserver.CompactSplit: Compaction failed 
> region=IntegrationTestLoadCommonCrawl,,1649373172576.b518f72941d4427e7e1923407643df67.,
>  storeName=b518f72941d4427e7e1923407643df67/c, priority=0, 
> startTime=1649373191325
> java.lang.IllegalStateException: Expected state: BLOCK_READY, actual state: 
> WRITING
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.expectState(HFileBlock.java:1190)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.getOnDiskSizeWithHeader(HFileBlock.java:1106)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishBlock(HFileWriterImpl.java:346)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.checkBlockBoundary(HFileWriterImpl.java:327)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.append(HFileWriterImpl.java:739)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:301)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:456)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138) 
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392) 
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> {noformat}
> and the dreaded "added a key not lexically larger than previous":
> {noformat}
> 2022-04-07T23:13:14,715 ERROR 
> [regionserver/ip-172-31-63-83:8120-shortCompactions-8] 
> regionserver.CompactSplit: Compaction failed 
> region=IntegrationTestLoadCommonCrawl,de.bao,1649373172576.083eb1ede8bf8c82174f614b06d4741e.,
>  storeName=083eb1ede8bf8c82174f614b06d4741e/c, priority=-1, 
> startTime=1649373194643
> java.io.IOException: Added a key not lexically larger than previous. 
> ...
>         at 
> org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63) 
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54) 
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:280)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:299)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:456)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138) 
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392) 
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> {noformat}
> and another bloomfilter addition sanity check failure:
> {noformat}
> 2022-04-07T23:23:44,001 ERROR 
> [regionserver/ip-172-31-49-8:8120-longCompactions-4] 
> regionserver.CompactSplit: Compaction failed 
> region=IntegrationTestLoadCommonCrawl,com.fc2.blog.idp,1649373775847.9d960338d85e2bb9c68669fad6a89f73.,
>  storeName=9d960338d85e2bb9c68669fad6a89f73/c, priority=-1, 
> startTime=1649373823967
> java.lang.IllegalStateException: First key in chunk already set: 
> com.filmestipo|/film/1335-algures-hoje-a-noite|1649373441720
>         at 
> org.apache.hadoop.hbase.io.hfile.CompoundBloomFilterWriter.append(CompoundBloomFilterWriter.java:174)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:55) 
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:280)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:299)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:456)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138) 
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392) 
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> {noformat}
> and a NPE in HFileBlock$Writer.getEncodingState: 
> {noformat}
> 2022-04-07T23:17:26,373 ERROR 
> [regionserver/ip-172-31-63-83:8120-shortCompactions-6] 
> regionserver.CompactSplit: Compaction failed 
> region=IntegrationTestLoadCommonCrawl,com.fc2.blog.idp,1649373315303.2bd16fac0c088fa135a855f0165f1dba.,
>  storeName=2bd16fac0c088fa135a855f0165f1dba/c, priority=10, 
> startTime=1649373438988
> java.lang.NullPointerException: null
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.getEncodingState(HFileBlock.java:837)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.beforeShipped(HFileBlock.java:831)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.beforeShipped(HFileWriterImpl.java:770)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.beforeShipped(StoreFileWriter.java:309)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:515)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138) 
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392) 
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702)
>  ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to