[ https://issues.apache.org/jira/browse/PHOENIX-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tanuj Khurana updated PHOENIX-7611: ----------------------------------- Summary: Memory corruption issue in Phoenix coprocessors in HBase 2 (was: Memory corruption issue in Phoenix coprocessors af HBase 2) > Memory corruption issue in Phoenix coprocessors in HBase 2 > ---------------------------------------------------------- > > Key: PHOENIX-7611 > URL: https://issues.apache.org/jira/browse/PHOENIX-7611 > Project: Phoenix > Issue Type: Bug > Affects Versions: 5.0.0, 5.1.0, 5.1.1, 5.2.0, 5.1.2, 5.1.3, 5.2.1 > Reporter: Tanuj Khurana > Priority: Major > > The memory corruption has surfaced in the form of segmentation faults which > crashes the Regionserver. We have observed this in production in our > environment as well as in ITs. We already have PHOENIX-7419 open for it. I > was also hitting this issue when working on PHOENIX-7591 There sometimes the > test would fail with a FATAL error message of SIGSEGV. But more often the > test would fail with a silent corruption. After adding more logging, what I > found was that some of the Cell references we were storing in > IndexRegionObserver were getting corrupted. > I started looking around in HBase for similar corruptions and found that from > HBase 2 onwards the contract with the coprocessor for preBatchMutate hook > says: > *Do not retain references to any Cells in Mutations* beyond the life of this > invocation. If need a Cell reference for later use, copy the cell and use > that > IndexRegionObserver maintains the row state in the memory as a Put mutation > which references to the Cells in the Mutation to handle concurrent updates > and the lifetime of these references exceeds the invocation of the hook. It > seems in some cases these cells can be backed by off-heap memory which can be > reclaimed or reused causing corruptions. > This also lines up with the stack trace attached to PHOENIX-7419 > ([^hs_err_pid783375.log)] > {code:java} > v ~StubRoutines::jbyte_disjoint_arraycopy > J 23481 C2 > org.apache.hadoop.hbase.unsafe.HBasePlatformDependent.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V > (22 bytes) @ 0x00007fb765360c32 [0x00007fb765360be0+0x52] > j > org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+56 > j > org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+105 > j > org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+65 > j > org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+56 > J 24630 C2 > org.apache.phoenix.coprocessor.GlobalIndexRegionScanner.apply(Lorg/apache/hadoop/hbase/client/Put;Lorg/apache/hadoop/hbase/client/Put;)V > (167 bytes) @ 0x00007fb7656262e0 [0x00007fb765625ca0+0x640] > J 24258 C1 > org.apache.phoenix.hbase.index.IndexRegionObserver.applyPendingPutMutations(Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;Lorg/apache/phoenix/hbase/index/IndexRegionObserver$BatchMutateContext;J)V > (430 bytes) @ 0x00007fb7654ac234 [0x00007fb7654aa880+0x19b4] > j > org.apache.phoenix.hbase.index.IndexRegionObserver.prepareDataRowStates(Lorg/apache/hadoop/hbase/coprocessor/ObserverContext;Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;Lorg/apache/phoenix/hbase/index/IndexRegionObserver$BatchMutateContext;J)V+30 > J 25543 C1 > org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutateWithExceptions(Lorg/apache/hadoop/hbase/coprocessor/ObserverContext;Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;)V > (1004 bytes) @ 0x00007fb764ef1ffc [0x00007fb764eef7c0+0x283c] > J 25542 C1 > org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutate(Lorg/apache/hadoop/hbase/coprocessor/ObserverContext;Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;)V > (76 bytes) @ 0x00007fb762d1dc24 [0x00007fb762d1db00+0x124] > J 22752 C1 > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$28.call(Ljava/lang/Object;)V > (17 bytes) @ 0x00007fb7629b21d4 [0x00007fb7629b1f00+0x2d4] > J 14450 C2 > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver()V > (70 bytes) @ 0x00007fb762483240 [0x00007fb7624830c0+0x180] > J 18110 C2 > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(Lorg/apache/hadoop/hbase/coprocessor/CoprocessorHost$ObserverOperation;)Z > (274 bytes) @ 0x00007fb76463c74c [0x00007fb76463c320+0x42c] > J 23033 C1 > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;)V > (42 bytes) @ 0x00007fb762b39dcc [0x00007fb762b39640+0x78c] > J 14181 C1 > org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.prepareMiniBatchOperations(Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;JLjava/util/List;)V > (105 bytes) @ 0x00007fb763a21b3c [0x00007fb763a21380+0x7bc] > J 14199 C1 > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(Lorg/apache/hadoop/hbase/regionserver/HRegion$BatchOperation;)V > (970 bytes) @ 0x00007fb763a37a94 [0x00007fb763a36f20+0xb74] > J 13124 C1 > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(Lorg/apache/hadoop/hbase/regionserver/HRegion$BatchOperation;)[Lorg/apache/hadoop/hbase/regionserver/OperationStatus; > (354 bytes) @ 0x00007fb7636a5a64 [0x00007fb7636a5320+0x744] {code} > This contract actually applies to all the methods in the RegionObserver > contract and was updated in HBASE-15735 introduced in HBase 2. Phoenix has > several coprocessors which implement the RegionObserver interface. We need to > investigate all such implementations and fix them if they are holding on to > cell references after the invocation of the hook API. > Two patterns I have seen are: > 1. We directly store the reference to the Cell or in a collection like > List<Cell> > 2. We store indirectly like in a Mutation object. > It seems this is only a problem if we store references to Cells which extend > the ByteBufferKeyValue which extends the ByteBufferExtendedCell since then > can be backed by off-heap memory. > KeyValue instances seem fine (the ones returned by > [GenericKeyValueBuilder.java|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/hbase/index/util/GenericKeyValueBuilder.java]) -- This message was sent by Atlassian Jira (v8.20.10#820010)