ranpanfeng created HBASE-23084:
----------------------------------
Summary: Assertion "Added a key not lexically larger than
previous" or "prevKey followed by a smaller key" caused abortion of RS
Key: HBASE-23084
URL: https://issues.apache.org/jira/browse/HBASE-23084
Project: HBase
Issue Type: Bug
Components: regionserver
Environment: We store many phoenix tables in hbase, only one table
trigger this issues.
When two jacent keys in wrong order, the observation as follows.
1. the prevKey is written into hbase via Put rpc
2. the current key via phoenix upsert statment( Multi rpc in underlying hbase)
3. the current Cell extends 1MB.
4. the error is easy to re-ocurr.
Reporter: ranpanfeng
when auto flush and manual flush is invoked, cell in immutable memstore
segment is scanned one by one to build data block and bloom filter block,
during this time, two adjacent keys are compared to assert that they are in
lexicographical order, but regretfully, the assertion fails in some scenarios,
when RS encounter these assertion failure, always call abort function to
terminate abnormally.
the version is 0ba7cc01a13dbb7fec8e6c5d599fc8b4936bae61 tag: rel/2.1.5
At least three cases are found.
*error case#1: getQualifierArray throws
java.lang.ArrayIndexOutOfBoundsException*
{code:java}
17:57:50.328 [MemStoreFlusher.0] ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region
server ${regionserver}: Replay of WAL required. Forcing server shutdown *****
org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}.
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361)
[hbase-server-2.1.5.jar:2.1.5]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.lang.ArrayIndexOutOfBoundsException
at
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1155)
~[hbase-common-2.1.5.jar:2.1.5]
at org.apache.hadoop.hbase.CellUtil.copyQualifierTo(CellUtil.java:309)
~[hbase-common-2.1.5.jar:2.1.5]
at org.apache.hadoop.hbase.CellUtil.cloneQualifier(CellUtil.java:121)
~[hbase-common-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.ByteBufferKeyValue.getQualifierArray(ByteBufferKeyValue.java:112)
~[hbase-common-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(CellUtil.java:1306)
~[hbase-common-2.1.5.jar:2.1.5]
at org.apache.hadoop.hbase.CellUtil.toString(CellUtil.java:1483)
~[hbase-common-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.ByteBufferKeyValue.toString(ByteBufferKeyValue.java:296)
~[hbase-common-2.1.5.jar:2.1.5]
at java.lang.String.valueOf(String.java:2994) ~[?:1.8.0_144]
at java.lang.StringBuilder.append(StringBuilder.java:131) ~[?:1.8.0_144]
at
org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:127)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733)
~[hbase-server-2.1.5.jar:2.1.5]
... 8 more
17:57:50.332 [MemStoreFlusher.0] ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer - RegionServer abort: loaded
coprocessors are: [org.apache.phoenix.coprocessor.ScanRegionObserver,
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
org.apache.phoenix.hbase.index.Indexer,
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl]
{code}
*error case#2: StoreScanner.checkScanOrder triggers "prevKey followed by a
smaller key"*
{code:java}
Exception in thread "RS_IN_MEMORY_COMPACTION-regionserver/${regionserver}"
java.lang.AssertionError: Key ${prevKey} followed by a smaller key ${key}
at
org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582)
at
org.apache.hadoop.hbase.regionserver.MemStoreCompactorSegmentsIterator.refillKVS(MemStoreCompactorSegmentsIterator.java:150)
at
org.apache.hadoop.hbase.regionserver.MemStoreCompactorSegmentsIterator.hasNext(MemStoreCompactorSegmentsIterator.java:72)
at
org.apache.hadoop.hbase.regionserver.CellChunkImmutableSegment.initializeCellSet(CellChunkImmutableSegment.java:146)
at
org.apache.hadoop.hbase.regionserver.CellChunkImmutableSegment.<init>(CellChunkImmutableSegment.java:66)
at
org.apache.hadoop.hbase.regionserver.SegmentFactory.createImmutableSegment(SegmentFactory.java:133)
at
org.apache.hadoop.hbase.regionserver.SegmentFactory.createImmutableSegmentByCompaction(SegmentFactory.java:59)
at
org.apache.hadoop.hbase.regionserver.MemStoreCompactor.createSubstitution(MemStoreCompactor.java:217)
at
org.apache.hadoop.hbase.regionserver.MemStoreCompactor.doCompaction(MemStoreCompactor.java:171)
at
org.apache.hadoop.hbase.regionserver.MemStoreCompactor.start(MemStoreCompactor.java:109)
at
org.apache.hadoop.hbase.regionserver.CompactingMemStore.flushInMemory(CompactingMemStore.java:422)
at
org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable.run(CompactingMemStore.java:534)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
at java.lang.Thread.run(Thread.java:748)
12:34:07.695 [MemStoreFlusher.0] INFO
org.apache.hadoop.hbase.regionserver.HRegion - Flushing 1/1 column families,
dataSize=239.26 MB heapSize=256.00 MB
12:34:08.476 [MemStoreFlusher.0] ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region
server ${regionserver}: Replay of WAL required. Forcing server shutdown *****
org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}.
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361)
[hbase-server-2.1.5.jar:2.1.5]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.lang.AssertionError: Key ${prevKey} followed by a smaller key
${key}
at
org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:127)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733)
~[hbase-server-2.1.5.jar:2.1.5]
... 8 more
{code}
*error case#3: BloomContext.sanityCheck triggers "Added a key not lexically
larger than previous."*
so, I alter the assertion code into if stmt
{code:java}
---
a/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
+++
b/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
@@ -33,6 +33,8 @@ import org.apache.hadoop.hbase.io.util.StreamUtils;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.WritableUtils;
import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
import sun.nio.ch.DirectBuffer;
import
org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting;
@@ -51,6 +53,7 @@ public final class ByteBufferUtils {
@VisibleForTesting
final static boolean UNSAFE_AVAIL = UnsafeAvailChecker.isAvailable();
public final static boolean UNSAFE_UNALIGNED =
UnsafeAvailChecker.unaligned();
+ private static Logger LOG = LoggerFactory.getLogger(ByteBufferUtils.class);
private ByteBufferUtils() {
}
@@ -1152,7 +1155,18 @@ public final class ByteBufferUtils {
public static void copyFromBufferToArray(byte[] out, ByteBuffer in, int
sourceOffset,
int destinationOffset, int length) {
if (in.hasArray()) {
- System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out,
destinationOffset, length);
+ try {
+ System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out,
destinationOffset, length);
+ } catch (Throwable e){
+ if (out == null) {
+ LOG.warn("[QINGGUI] out=null");
+ } else if (in == null || in.array() == null) {
+ LOG.warn("[QINGGUI] in=null");
+ } else {
+ LOG.warn("[QINGGUI] out.length={}, in.length={}, sourceOffset={}
destinationOffset={} length={}",
+ out.length, in.array().length, sourceOffset, destinationOffset,
length);
+ }
+ }
} else if (UNSAFE_AVAIL) {
UnsafeAccess.copy(in, sourceOffset, out, destinationOffset, length);
} else {
{code}
re-deploy my binary, then the failure drifts to bloom filter building code.
{code:java}
12:54:39.966 [rs(${regionserver})-flush-proc-pool3-thread-2] WARN
org.apache.hadoop.hbase.regionserver.HStore - Failed flushing store file,
retrying num=9
java.io.IOException: Added a key not lexically larger than previous. Current
cell ...
at
org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:213)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:232)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:133)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2232)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:68)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:56)
[hbase-server-2.1.5.jar:2.1.5]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[?:1.8.0_144]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[?:1.8.0_144]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_144]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
12:54:39.979 [rs(${regionserver})-flush-proc-pool3-thread-2] ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region
server ${regionserver}: Replay of WAL required. Forcing server shutdown *****
org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}.
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2232)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:68)
[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:56)
[hbase-server-2.1.5.jar:2.1.5]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[?:1.8.0_144]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[?:1.8.0_144]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_144]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.io.IOException: Added a key not lexically larger than previous.
Current cell = ...
at
org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:213)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:232)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:133)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259)
~[hbase-server-2.1.5.jar:2.1.5]
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733)
~[hbase-server-2.1.5.jar:2.1.5]
... 12 more
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)