[jira] [Commented] (HBASE-26092) JVM core dump in the replication path

Anoop Sam John (Jira) Wed, 28 Jul 2021 05:25:11 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388720#comment-17388720
 ]


Anoop Sam John commented on HBASE-26092:
----------------------------------------

The replicateWALEntry request is reached at an RS and read into a Pooled BB.  
This is handled by ReplicationHandler thread.  In ReplicationSink, this will 
create mutation batches and call table.batch() (1 or more calls).   So this 
call is a sync call.  The call wait until it is being processed fully. Means 
this RS writes to the original region and that write is completed.  After that 
only this ReplicationThread's works getting over and finally calling  cleanup 
so as to release the BB.
Here NettyRPCClient in place.  I checked the code in details and tried to do 
some hacks in code for the netty EventLoop thread to sleep sometime before 
actually sending the req, (NettyRpcConnection # sendRequest0)  and with adding 
more logs. Its clear from these experiment, to me that the release of BB happen 
after this table.batch() is been completed.

So I dont think we have a BB leak as in other case.  

> JVM core dump in the replication path
> -------------------------------------
>
>                 Key: HBASE-26092
>                 URL: https://issues.apache.org/jira/browse/HBASE-26092
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.3.5
>            Reporter: Huaxiang Sun
>            Priority: Critical
>
> When replication is turned on, we found the following code dump in the region 
> server. 
> I checked the code dump for replication. I think I got some ideas. For 
> replication, when RS receives walEdits from remote cluster, it needs to send 
> them out to final RS. In this case, NettyRpcConnection is deployed, calls are 
> queued while it refers to ByteBuffer in the context of replicationHandler 
> (returned to the pool once it returns). Code dump will happen since the 
> byteBuffer has been reused. Needs ref count in this asynchronous processing.
>  
> Feel free to take it, otherwise, I will try to work on a patch later.
>  
>  
> {code:java}
> Stack: [0x00007fb1bf039000,0x00007fb1bf13a000],  sp=0x00007fb1bf138560,  free 
> space=1021k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 28175 C2 
> org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I 
> (21 bytes) @ 0x00007fdbbbb2663c [0x00007fdbbbb263c0+0x27c]
> J 14912 C2 
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.writeRequest(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Lorg/apache/hadoop/hbase/ipc/Call;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (370 bytes) @ 0x00007fdbbb94b590 [0x00007fdbbb949c00+0x1990]
> J 14911 C2 
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (30 bytes) @ 0x00007fdbb972d1d4 [0x00007fdbb972d1a0+0x34]
> J 30476 C2 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (149 bytes) @ 0x00007fdbbd4e7084 [0x00007fdbbd4e6900+0x784]
> J 14914 C2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$6$1.run()V (22 
> bytes) @ 0x00007fdbbb9344ec [0x00007fdbbb934280+0x26c]
> J 23528 C2 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z
>  (106 bytes) @ 0x00007fdbbcbb0efc [0x00007fdbbcbb0c40+0x2bc]
> J 15987% C2 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (461 
> bytes) @ 0x00007fdbbbaf1580 [0x00007fdbbbaf1360+0x220]
> j  
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44
> j  
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run()V+11
> j  
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run()V+4
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-26092) JVM core dump in the replication path

Reply via email to