[ 
https://issues.apache.org/jira/browse/HBASE-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha updated HBASE-26155:
-------------------------------
    Description: 
There are scanner close caused regionserver JVM coredump problems on our 
production clusters.

{code:java}
Stack: [0x00007fca4b0cc000,0x00007fca4b1cd000],  sp=0x00007fca4b1cb0d8,  free 
space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x7fd314]
J 2810  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 
bytes) @ 0x00007fdae55a9e61 [0x00007fdae55a9d80+0xe1]
j  
org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+36
j  
org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+69
j  
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+39
j  
org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31
j  
org.apache.hadoop.hbase.KeyValueUtil.appendKeyTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+43
J 14724 C2 org.apache.hadoop.hbase.regionserver.StoreScanner.shipped()V (51 
bytes) @ 0x00007fdae6a298d0 [0x00007fdae6a29780+0x150]
J 21387 C2 
org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run()V
 (53 bytes) @ 0x00007fdae622bab8 [0x00007fdae622acc0+0xdf8]
J 26353 C2 
org.apache.hadoop.hbase.ipc.ServerCall.setResponse(Lorg/apache/hbase/thirdparty/com/google/protobuf/Message;Lorg/apache/hadoop/hbase/CellScanner;Ljava/lang/Throwable;Ljava/lang/String;)V
 (384 bytes) @ 0x00007fdae7f139d8 [0x00007fdae7f12980+0x1058]
J 26226 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1554 bytes) @ 
0x00007fdae959f68c [0x00007fdae959e400+0x128c]
J 19598% C2 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V
 (338 bytes) @ 0x00007fdae81c54d4 [0x00007fdae81c53e0+0xf4]
{code}




  was:
There are scanner close caused regionserver JVM coredump problems on our 
production clusters.

{code:java}
Stack: [0x00007fca4b0cc000,0x00007fca4b1cd000],  sp=0x00007fca4b1cb0d8,  free 
space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x7fd314]
J 2810  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 
bytes) @ 0x00007fdae55a9e61 [0x00007fdae55a9d80+0xe1]
j  
org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+36
j  
org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+69
j  
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+39
j  
org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31
j  
org.apache.hadoop.hbase.KeyValueUtil.appendKeyTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+43
J 14724 C2 org.apache.hadoop.hbase.regionserver.StoreScanner.shipped()V (51 
bytes) @ 0x00007fdae6a298d0 [0x00007fdae6a29780+0x150]
J 21387 C2 
org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run()V
 (53 bytes) @ 0x00007fdae622bab8 [0x00007fdae622acc0+0xdf8]
J 26353 C2 
org.apache.hadoop.hbase.ipc.ServerCall.setResponse(Lorg/apache/hbase/thirdparty/com/google/protobuf/Message;Lorg/apache/hadoop/hbase/CellScanner;Ljava/lang/Throwable;Ljava/lang/String;)V
 (384 bytes) @ 0x00007fdae7f139d8 [0x00007fdae7f12980+0x1058]
J 26226 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1554 bytes) @ 
0x00007fdae959f68c [0x00007fdae959e400+0x128c]
J 19598% C2 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V
 (338 bytes) @ 0x00007fdae81c54d4 [0x00007fdae81c53e0+0xf4]
{code}

There is no guarantee for RPC calls to hold unique scanners, right? 
For example, when there are client disconnect problems, RS may not terminate 
the scanner nexts until it checks the `rpcCall.disconnectSince()` time. But 
before this another scan RPC may also use the same scanner that holds in the RS 
cache by RegionScannerHolder. Then they change the `previousCell` in the 
scanner in different threads...












> JVM crash when rpc calls close scanner
> --------------------------------------
>
>                 Key: HBASE-26155
>                 URL: https://issues.apache.org/jira/browse/HBASE-26155
>             Project: HBase
>          Issue Type: Bug
>          Components: Scanners
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Xiaolin Ha
>            Priority: Major
>
> There are scanner close caused regionserver JVM coredump problems on our 
> production clusters.
> {code:java}
> Stack: [0x00007fca4b0cc000,0x00007fca4b1cd000],  sp=0x00007fca4b1cb0d8,  free 
> space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> V  [libjvm.so+0x7fd314]
> J 2810  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V 
> (0 bytes) @ 0x00007fdae55a9e61 [0x00007fdae55a9d80+0xe1]
> j  
> org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+36
> j  
> org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+69
> j  
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+39
> j  
> org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31
> j  
> org.apache.hadoop.hbase.KeyValueUtil.appendKeyTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+43
> J 14724 C2 org.apache.hadoop.hbase.regionserver.StoreScanner.shipped()V (51 
> bytes) @ 0x00007fdae6a298d0 [0x00007fdae6a29780+0x150]
> J 21387 C2 
> org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run()V
>  (53 bytes) @ 0x00007fdae622bab8 [0x00007fdae622acc0+0xdf8]
> J 26353 C2 
> org.apache.hadoop.hbase.ipc.ServerCall.setResponse(Lorg/apache/hbase/thirdparty/com/google/protobuf/Message;Lorg/apache/hadoop/hbase/CellScanner;Ljava/lang/Throwable;Ljava/lang/String;)V
>  (384 bytes) @ 0x00007fdae7f139d8 [0x00007fdae7f12980+0x1058]
> J 26226 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1554 bytes) @ 
> 0x00007fdae959f68c [0x00007fdae959e400+0x128c]
> J 19598% C2 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V
>  (338 bytes) @ 0x00007fdae81c54d4 [0x00007fdae81c53e0+0xf4]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to