[ 
https://issues.apache.org/jira/browse/PHOENIX-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanuj Khurana updated PHOENIX-7611:
-----------------------------------
    Summary: Memory corruption issue in Phoenix coprocessors in HBase 2  (was: 
Memory corruption issue in Phoenix coprocessors af HBase 2)

> Memory corruption issue in Phoenix coprocessors in HBase 2
> ----------------------------------------------------------
>
>                 Key: PHOENIX-7611
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7611
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 5.1.0, 5.1.1, 5.2.0, 5.1.2, 5.1.3, 5.2.1
>            Reporter: Tanuj Khurana
>            Priority: Major
>
> The memory corruption has surfaced in the form of segmentation faults which 
> crashes the Regionserver. We have observed this in production in our 
> environment as well as in ITs. We already have PHOENIX-7419 open for it. I 
> was also hitting this issue when working on PHOENIX-7591 There sometimes the 
> test would fail with a FATAL error message of SIGSEGV. But more often the 
> test would fail with a silent corruption. After adding more logging, what I 
> found was that some of the Cell references we were storing in 
> IndexRegionObserver  were getting corrupted.
> I started looking around in HBase for similar corruptions and found that from 
> HBase 2 onwards the contract with the coprocessor for preBatchMutate hook 
> says:
> *Do not retain references to any Cells in Mutations* beyond the life of this 
> invocation. If need a Cell reference for later use, copy the cell and use 
> that 
> IndexRegionObserver maintains the row state in the memory as a Put mutation 
> which references to the Cells in the Mutation to handle concurrent updates 
> and the lifetime of these references exceeds the invocation of the hook. It 
> seems in some cases these cells can be backed by off-heap memory which can be 
> reclaimed or reused causing corruptions.
> This also lines up with the stack trace attached to PHOENIX-7419 
> ([^hs_err_pid783375.log)] 
> {code:java}
> v  ~StubRoutines::jbyte_disjoint_arraycopy
> J 23481 C2 
> org.apache.hadoop.hbase.unsafe.HBasePlatformDependent.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V
>  (22 bytes) @ 0x00007fb765360c32 [0x00007fb765360be0+0x52]
> j  
> org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+56
> j  
> org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+105
> j  
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+65
> j  
> org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+56
> J 24630 C2 
> org.apache.phoenix.coprocessor.GlobalIndexRegionScanner.apply(Lorg/apache/hadoop/hbase/client/Put;Lorg/apache/hadoop/hbase/client/Put;)V
>  (167 bytes) @ 0x00007fb7656262e0 [0x00007fb765625ca0+0x640]
> J 24258 C1 
> org.apache.phoenix.hbase.index.IndexRegionObserver.applyPendingPutMutations(Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;Lorg/apache/phoenix/hbase/index/IndexRegionObserver$BatchMutateContext;J)V
>  (430 bytes) @ 0x00007fb7654ac234 [0x00007fb7654aa880+0x19b4]
> j  
> org.apache.phoenix.hbase.index.IndexRegionObserver.prepareDataRowStates(Lorg/apache/hadoop/hbase/coprocessor/ObserverContext;Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;Lorg/apache/phoenix/hbase/index/IndexRegionObserver$BatchMutateContext;J)V+30
> J 25543 C1 
> org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutateWithExceptions(Lorg/apache/hadoop/hbase/coprocessor/ObserverContext;Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;)V
>  (1004 bytes) @ 0x00007fb764ef1ffc [0x00007fb764eef7c0+0x283c]
> J 25542 C1 
> org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutate(Lorg/apache/hadoop/hbase/coprocessor/ObserverContext;Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;)V
>  (76 bytes) @ 0x00007fb762d1dc24 [0x00007fb762d1db00+0x124]
> J 22752 C1 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$28.call(Ljava/lang/Object;)V
>  (17 bytes) @ 0x00007fb7629b21d4 [0x00007fb7629b1f00+0x2d4]
> J 14450 C2 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver()V
>  (70 bytes) @ 0x00007fb762483240 [0x00007fb7624830c0+0x180]
> J 18110 C2 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(Lorg/apache/hadoop/hbase/coprocessor/CoprocessorHost$ObserverOperation;)Z
>  (274 bytes) @ 0x00007fb76463c74c [0x00007fb76463c320+0x42c]
> J 23033 C1 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;)V
>  (42 bytes) @ 0x00007fb762b39dcc [0x00007fb762b39640+0x78c]
> J 14181 C1 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.prepareMiniBatchOperations(Lorg/apache/hadoop/hbase/regionserver/MiniBatchOperationInProgress;JLjava/util/List;)V
>  (105 bytes) @ 0x00007fb763a21b3c [0x00007fb763a21380+0x7bc]
> J 14199 C1 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(Lorg/apache/hadoop/hbase/regionserver/HRegion$BatchOperation;)V
>  (970 bytes) @ 0x00007fb763a37a94 [0x00007fb763a36f20+0xb74]
> J 13124 C1 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(Lorg/apache/hadoop/hbase/regionserver/HRegion$BatchOperation;)[Lorg/apache/hadoop/hbase/regionserver/OperationStatus;
>  (354 bytes) @ 0x00007fb7636a5a64 [0x00007fb7636a5320+0x744] {code}
> This contract actually applies to all the methods in the RegionObserver 
> contract and was updated in HBASE-15735 introduced in HBase 2. Phoenix has 
> several coprocessors which implement the RegionObserver interface. We need to 
> investigate all such implementations and fix them if they are holding on to 
> cell references after the invocation of the hook API.
> Two patterns I have seen are:
> 1. We directly store the reference to the Cell or in a collection like 
> List<Cell>
> 2. We store indirectly like in a Mutation object.
> It seems this is only a problem if we store references to Cells which extend 
> the ByteBufferKeyValue which extends the ByteBufferExtendedCell since then 
> can be backed by off-heap memory.
> KeyValue instances seem fine (the ones returned by 
> [GenericKeyValueBuilder.java|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/hbase/index/util/GenericKeyValueBuilder.java])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to