[ https://issues.apache.org/jira/browse/HBASE-26938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522580#comment-17522580 ]
Josh Elser commented on HBASE-26938: ------------------------------------ I hope I didn't push you to abandon your PR with my comments, Andrew, I just intended them as a part of normal review. I think your approach was perfectly fine. That said, agree wholly with your comments that we need to choose one approach and go with it. [~zhangduo] I went through your commit and left some [comments dangling on there|[https://github.com/Apache9/hbase/commit/79fa3a9d72aade0bfe490b301f909b3d5722de06]|https://github.com/Apache9/hbase/commit/79fa3a9d72aade0bfe490b301f909b3d5722de06].]. I'd suggest converting that into a PR and getting some tests run against it. I'll see if I can steal some time to throw it up on a cluster and test it as well. I think your approach is a little cleaner (though a little less explicit – more layers to unwrap to find the "magic" that actually uses that StoreFileWriterTracker :)) > Compaction failures after StoreFileTracker integration > ------------------------------------------------------ > > Key: HBASE-26938 > URL: https://issues.apache.org/jira/browse/HBASE-26938 > Project: HBase > Issue Type: Bug > Components: Compaction > Affects Versions: 2.5.0, 3.0.0-alpha-2, 2.6.0 > Reporter: Andrew Kyle Purtell > Assignee: Duo Zhang > Priority: Blocker > Fix For: 2.5.0, 3.0.0-alpha-3 > > > [ Currently this has only been tested with branch-2.5 and branch-2. Testing > with master next, will update afterward. ] > Test cluster of 10 regionservers is configured each RS with 5 flush threads, > 5 large compaction threads, and 10 small compaction threads. > Hadoop is 3.3.2. Java is 11. HFiles are on HDFS. > All the StoreFileTracker implementations, DEFAULT or FILE, exhibit compaction > time store writer errors in an ingest heavy use case. Unit tests don't seem > to cover whatever this is. Most compactions succeed, but some do not. Those > that do not are failing with state or sanity check assertions. Below errors > are all from DEFAULT. They seem related... store writer instance > usage/close/locking issues during compactions. > Warnings like "writer exists when it should not": > {noformat} > 2022-04-07T23:13:11,351 WARN > [regionserver/ip-172-31-63-83:8120-shortCompactions-8] > compactions.Compactor: Writer exists when it should not: { > > hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/b518f72941d4427e7e1923407643df67/.tmp/c/29d7b88c4c214ddcbba4f747514a2cf5 > } > {noformat} > Errors like: > IllegalStateException thrown from > HFileBlockIndex$BlockIndexWriter.shouldWriteBlock: > {noformat} > 2022-04-07T23:13:11,508 ERROR > [regionserver/ip-172-31-63-83:8120-shortCompactions-6] > regionserver.CompactSplit: Compaction failed > region=IntegrationTestLoadCommonCrawl,,1649373172576.b518f72941d4427e7e1923407643df67., > storeName=b518f72941d4427e7e1923407643df67/c, priority=10, > startTime=1649373185476 > java.lang.IllegalStateException: curInlineChunk is null; has shouldWriteBlock > been called with closing=true and then called again? > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.shouldWriteBlock(HFileBlockIndex.java:1258) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.writeInlineBlocks(HFileWriterImpl.java:523) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:608) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:377) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.abortWriter(DefaultCompactor.java:84) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.abortWriter(DefaultCompactor.java:76) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:384) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > {noformat} > and IllegalStateException thrown from HFileBlock$Writer.expectState: > {noformat} > 2022-04-07T23:13:11,559 ERROR > [regionserver/ip-172-31-63-83:8120-shortCompactions-8] > regionserver.CompactSplit: Compaction failed > region=IntegrationTestLoadCommonCrawl,,1649373172576.b518f72941d4427e7e1923407643df67., > storeName=b518f72941d4427e7e1923407643df67/c, priority=0, > startTime=1649373191325 > java.lang.IllegalStateException: Expected state: BLOCK_READY, actual state: > WRITING > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.expectState(HFileBlock.java:1190) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.getOnDiskSizeWithHeader(HFileBlock.java:1106) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishBlock(HFileWriterImpl.java:346) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.checkBlockBoundary(HFileWriterImpl.java:327) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.append(HFileWriterImpl.java:739) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:301) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:456) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > {noformat} > and the dreaded "added a key not lexically larger than previous": > {noformat} > 2022-04-07T23:13:14,715 ERROR > [regionserver/ip-172-31-63-83:8120-shortCompactions-8] > regionserver.CompactSplit: Compaction failed > region=IntegrationTestLoadCommonCrawl,de.bao,1649373172576.083eb1ede8bf8c82174f614b06d4741e., > storeName=083eb1ede8bf8c82174f614b06d4741e/c, priority=-1, > startTime=1649373194643 > java.io.IOException: Added a key not lexically larger than previous. > ... > at > org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:280) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:299) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:456) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > {noformat} > and another bloomfilter addition sanity check failure: > {noformat} > 2022-04-07T23:23:44,001 ERROR > [regionserver/ip-172-31-49-8:8120-longCompactions-4] > regionserver.CompactSplit: Compaction failed > region=IntegrationTestLoadCommonCrawl,com.fc2.blog.idp,1649373775847.9d960338d85e2bb9c68669fad6a89f73., > storeName=9d960338d85e2bb9c68669fad6a89f73/c, priority=-1, > startTime=1649373823967 > java.lang.IllegalStateException: First key in chunk already set: > com.filmestipo|/film/1335-algures-hoje-a-noite|1649373441720 > at > org.apache.hadoop.hbase.io.hfile.CompoundBloomFilterWriter.append(CompoundBloomFilterWriter.java:174) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:55) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:280) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:299) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:456) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > {noformat} > and a NPE in HFileBlock$Writer.getEncodingState: > {noformat} > 2022-04-07T23:17:26,373 ERROR > [regionserver/ip-172-31-63-83:8120-shortCompactions-6] > regionserver.CompactSplit: Compaction failed > region=IntegrationTestLoadCommonCrawl,com.fc2.blog.idp,1649373315303.2bd16fac0c088fa135a855f0165f1dba., > storeName=2bd16fac0c088fa135a855f0165f1dba/c, priority=10, > startTime=1649373438988 > java.lang.NullPointerException: null > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.getEncodingState(HFileBlock.java:837) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.beforeShipped(HFileBlock.java:831) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.beforeShipped(HFileWriterImpl.java:770) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.beforeShipped(StoreFileWriter.java:309) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:515) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702) > ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT] > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)