[
https://issues.apache.org/jira/browse/HBASE-26938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519856#comment-17519856
]
Andrew Kyle Purtell edited comment on HBASE-26938 at 4/9/22 2:09 AM:
---------------------------------------------------------------------
HBASE-26675 is suspect as not quite sufficient. Evidence: the "writer exists
when it should not" message, which comes from here:
https://github.com/apache/hbase/blob/eb22319fef64941f257f9bac6114292e74553aba/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L359
{noformat}
2022-04-07T23:13:11,351 WARN
[regionserver/ip-172-31-63-83:8120-shortCompactions-8]
compactions.Compactor: Writer exists when it should not: {
hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/b518f72941d4427e7e1923407643df67/.tmp/c/29d7b88c4c214ddcbba4f747514a2cf5
}
{noformat}
Duo says on HBASE-26675:
{quote}
And since it is not a big deal to the upper layer an old snapshot, we could
just make the writer instance volatile and use it directly in the method, just
make sure we always use the same instance in the method, i.e, assign it to a
local var first.
{quote}
... this could be a false lead but I'll use an AtomicReference instead of a
volatile field to hold the StoreFileWriter instance in Compactor just so the
getter will force all users to copy the reference into a local variable first.
Let me see if this makes any difference.
was (Author: apurtell):
HBASE-26675 is suspect as not quite sufficient. Evidence: the "writer exists
when it should not" message, which comes from here:
https://github.com/apache/hbase/blob/eb22319fef64941f257f9bac6114292e74553aba/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java#L359
{noformat}
2022-04-07T23:13:11,351 WARN
[regionserver/ip-172-31-63-83:8120-shortCompactions-8]
compactions.Compactor: Writer exists when it should not: {
hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/b518f72941d4427e7e1923407643df67/.tmp/c/29d7b88c4c214ddcbba4f747514a2cf5
}
{noformat}
Duo says on HBASE-26675:
{quote}
And since it is not a big deal to the upper layer an old snapshot, we could
just make the writer instance volatile and use it directly in the method, just
make sure we always use the same instance in the method, i.e, assign it to a
local var first.
{quote}
... this could be a false lead but I'll use an AtomicReference instead of a
volatile just so the getter will force all users to copy the reference into a
local variable first. Let me see if this makes any difference.
> Compaction failures after StoreFileTracker integration (branch-2, branch-2.5)
> -----------------------------------------------------------------------------
>
> Key: HBASE-26938
> URL: https://issues.apache.org/jira/browse/HBASE-26938
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.5.0, 2.6.0
> Reporter: Andrew Kyle Purtell
> Priority: Blocker
> Fix For: 2.5.0
>
>
> [ Currently this has only been tested with branch-2.5 and branch-2. Testing
> with master next, will update afterward. ]
> Test cluster of 10 regionservers is configured each RS with 5 flush threads,
> 5 large compaction threads, and 10 small compaction threads.
> Hadoop is 3.3.2. Java is 11. HFiles are on HDFS.
> All the StoreFileTracker implementations, DEFAULT or FILE, exhibit compaction
> time store writer errors in an ingest heavy use case. Unit tests don't seem
> to cover whatever this is. Most compactions succeed, but some do not. Those
> that do not are failing with state or sanity check assertions. Below errors
> are all from DEFAULT. They seem related... store writer instance
> usage/close/locking issues during compactions.
> Warnings like "writer exists when it should not":
> {noformat}
> 2022-04-07T23:13:11,351 WARN
> [regionserver/ip-172-31-63-83:8120-shortCompactions-8]
> compactions.Compactor: Writer exists when it should not: {
>
> hdfs://ip-172-31-58-47.us-west-2.compute.internal:8020/hbase/data/default/IntegrationTestLoadCommonCrawl/b518f72941d4427e7e1923407643df67/.tmp/c/29d7b88c4c214ddcbba4f747514a2cf5
> }
> {noformat}
> Errors like:
> IllegalStateException thrown from
> HFileBlockIndex$BlockIndexWriter.shouldWriteBlock:
> {noformat}
> 2022-04-07T23:13:11,508 ERROR
> [regionserver/ip-172-31-63-83:8120-shortCompactions-6]
> regionserver.CompactSplit: Compaction failed
> region=IntegrationTestLoadCommonCrawl,,1649373172576.b518f72941d4427e7e1923407643df67.,
> storeName=b518f72941d4427e7e1923407643df67/c, priority=10,
> startTime=1649373185476
> java.lang.IllegalStateException: curInlineChunk is null; has shouldWriteBlock
> been called with closing=true and then called again?
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.shouldWriteBlock(HFileBlockIndex.java:1258)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.writeInlineBlocks(HFileWriterImpl.java:523)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:608)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:377)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.abortWriter(DefaultCompactor.java:84)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.abortWriter(DefaultCompactor.java:76)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:384)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> {noformat}
> and IllegalStateException thrown from HFileBlock$Writer.expectState:
> {noformat}
> 2022-04-07T23:13:11,559 ERROR
> [regionserver/ip-172-31-63-83:8120-shortCompactions-8]
> regionserver.CompactSplit: Compaction failed
> region=IntegrationTestLoadCommonCrawl,,1649373172576.b518f72941d4427e7e1923407643df67.,
> storeName=b518f72941d4427e7e1923407643df67/c, priority=0,
> startTime=1649373191325
> java.lang.IllegalStateException: Expected state: BLOCK_READY, actual state:
> WRITING
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.expectState(HFileBlock.java:1190)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.getOnDiskSizeWithHeader(HFileBlock.java:1106)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishBlock(HFileWriterImpl.java:346)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.checkBlockBoundary(HFileWriterImpl.java:327)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.append(HFileWriterImpl.java:739)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:301)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:456)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> {noformat}
> and the dreaded "added a key not lexically larger than previous":
> {noformat}
> 2022-04-07T23:13:14,715 ERROR
> [regionserver/ip-172-31-63-83:8120-shortCompactions-8]
> regionserver.CompactSplit: Compaction failed
> region=IntegrationTestLoadCommonCrawl,de.bao,1649373172576.083eb1ede8bf8c82174f614b06d4741e.,
> storeName=083eb1ede8bf8c82174f614b06d4741e/c, priority=-1,
> startTime=1649373194643
> java.io.IOException: Added a key not lexically larger than previous.
> ...
> at
> org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:280)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:299)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:456)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> {noformat}
> and another bloomfilter addition sanity check failure:
> {noformat}
> 2022-04-07T23:23:44,001 ERROR
> [regionserver/ip-172-31-49-8:8120-longCompactions-4]
> regionserver.CompactSplit: Compaction failed
> region=IntegrationTestLoadCommonCrawl,com.fc2.blog.idp,1649373775847.9d960338d85e2bb9c68669fad6a89f73.,
> storeName=9d960338d85e2bb9c68669fad6a89f73/c, priority=-1,
> startTime=1649373823967
> java.lang.IllegalStateException: First key in chunk already set:
> com.filmestipo|/film/1335-algures-hoje-a-noite|1649373441720
> at
> org.apache.hadoop.hbase.io.hfile.CompoundBloomFilterWriter.append(CompoundBloomFilterWriter.java:174)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:55)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:280)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:299)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:456)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> {noformat}
> and a NPE in HFileBlock$Writer.getEncodingState:
> {noformat}
> 2022-04-07T23:17:26,373 ERROR
> [regionserver/ip-172-31-63-83:8120-shortCompactions-6]
> regionserver.CompactSplit: Compaction failed
> region=IntegrationTestLoadCommonCrawl,com.fc2.blog.idp,1649373315303.2bd16fac0c088fa135a855f0165f1dba.,
> storeName=2bd16fac0c088fa135a855f0165f1dba/c, priority=10,
> startTime=1649373438988
> java.lang.NullPointerException: null
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.getEncodingState(HFileBlock.java:837)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.beforeShipped(HFileBlock.java:831)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.beforeShipped(HFileWriterImpl.java:770)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.beforeShipped(StoreFileWriter.java:309)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:515)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:364)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1138)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2392)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:656)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:702)
> ~[hbase-server-2.5.0-SNAPSHOT.jar:2.5.0-SNAPSHOT]
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)