[ 
https://issues.apache.org/jira/browse/HBASE-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17374617#comment-17374617
 ] 

Anoop Sam John commented on HBASE-26036:
----------------------------------------

In case of get calls from Client layer, the RSRpcServices make use of scan APIs 
in Region and serialize that result cells (to CellBlock or Result PB) and then 
only close the scanner.  So the result cells are copied here.  But in case of 
get() API in HRegion, this is not happening and that is the issue.  I am of the 
opinion that we should copy the result and then close the scanner.  This API is 
exposed to CPs.  Any CP can face this kind of issue.  +1 to Duo's idea of 
fixing this.

For checkAndXXX methods, we can start using the getScanner way and use the data 
and close scanner then only. (As in patch) That can avoid heap copy and heap 
pressure. These are our internal impl method and we can change it. (Pls keep 
some fat notes why we dont use HRegion#get() API when such a straight forward 
one is available with us)

> DBB released too early and dirty data for some operations
> ---------------------------------------------------------
>
>                 Key: HBASE-26036
>                 URL: https://issues.apache.org/jira/browse/HBASE-26036
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc
>    Affects Versions: 3.0.0-alpha-1, 2.0.0
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Critical
>
> Before HBASE-25187, we found there are regionserver JVM crashing problems on 
> our production clusters, the coredump infos are as follows,
> {code:java}
> Stack: [0x00007f621ba8d000,0x00007f621bb8e000],  sp=0x00007f621bb8c0e0,  free 
> space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 10829 C2 org.apache.hadoop.hbase.ByteBufferKeyValue.getTimestamp()J (9 
> bytes) @ 0x00007f6a5ee11b2d [0x00007f6a5ee11ae0+0x4d]
> J 22844 C2 
> org.apache.hadoop.hbase.regionserver.HRegion.doCheckAndRowMutate([B[B[BLorg/apache/hadoop/hbase/filter/CompareFilter$CompareOp;Lorg/apache/hadoop/hbase/filter/ByteArrayComparable;Lorg/apache/hadoop/hbase/client/RowMutations;Lorg/apache/hadoop/hbase/client/Mutation;Z)Z
>  (540 bytes) @ 0x00007f6a60bed144 [0x00007f6a60beb320+0x1e24]
> J 17972 C2 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkAndRowMutate(Lorg/apache/hadoop/hbase/regionserver/Region;Ljava/util/List;Lorg/apache/hadoop/hbase/CellScanner;[B[B[BLorg/apache/hadoop/hbase/filter/CompareFilter$CompareOp;Lorg/apache/hadoop/hbase/filter/ByteArrayComparable;Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$RegionActionResult$Builder;)Z
>  (312 bytes) @ 0x00007f6a5f4a7ed0 [0x00007f6a5f4a6f40+0xf90]
> J 26197 C2 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(Lorg/apache/hbase/thirdparty/com/google/protobuf/RpcController;Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$MultiRequest;)Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$MultiResponse;
>  (644 bytes) @ 0x00007f6a61538b0c [0x00007f6a61537940+0x11cc]
> J 26332 C2 
> org.apache.hadoop.hbase.ipc.RpcServer.call(Lorg/apache/hadoop/hbase/ipc/RpcCall;Lorg/apache/hadoop/hbase/monitoring/MonitoredRPCHandler;)Lorg/apache/hadoop/hbase/util/Pair;
>  (566 bytes) @ 0x00007f6a615e8228 [0x00007f6a615e79c0+0x868]
> J 20563 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1196 bytes) @ 
> 0x00007f6a60711a4c [0x00007f6a60711000+0xa4c]
> J 19656% C2 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V
>  (338 bytes) @ 0x00007f6a6039a414 [0x00007f6a6039a320+0xf4]
> j  org.apache.hadoop.hbase.ipc.RpcExecutor$1.run()V+24
> j  java.lang.Thread.run()V+11
> v  ~StubRoutines::call_stub
> {code}
> I have made a UT to reproduce this error, it can occur 100%。
> After HBASE-25187,the check result of the checkAndMutate will be false, 
> because it read wrong/dirty data from the released ByteBuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to