[
https://issues.apache.org/jira/browse/HBASE-28584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866201#comment-17866201
]
Andrew Kyle Purtell edited comment on HBASE-28584 at 7/16/24 3:02 AM:
----------------------------------------------------------------------
Still don't have it but I can repro with one RS cluster replicating to another
one RS cluster. Use PE to create test tables on both sides, set the replication
scope on side A to 1, then use PE to generate 100B rows, something to keep it
loaded for a long time.
There is a specific sequence of events that have to happen. Everything is great
until the first time the sink regionserver exceeds the memstore limit. Within a
very short time from when we begin throwing RegionTooBusy exceptions on the
sink side the sink regionserver will crash. After restarting the sink
regionserver it will crash again. The only way to stop the crashing is for the
source side load to be stopped and sufficient restarts of the sink regionserver
to clear enough backlog.
This is some kind of multi factor problem involving load, particularly large
batched RPCs on the source side, and netty buffer allocation and management. I
cannot say for certain but feel pretty sure we are releasing a buffer too early
which is getting reused while someone else is still trying to do something with
it. In replication code, we act both as a server and client at the same time,
receiving the replication RPC as a server and then sending out the edits to the
local side as a client. Both netty client and server implementations use the
pooled allocators (by default) and so there is an opportunity to release()
something too early that normally is not an issue except when under high load.
We crash with the underlying buffer type being both on heap and direct buffer
types, so it isn't specific to a particular buffer implemenation, but may be
common to buffer management, and likely an hbase refcounting/release problem.
was (Author: apurtell):
Still don't have it but I can repro with one RS cluster replicating to another
one RS cluster. Use PE to create test tables on both sides, set the replication
scope on side A to 1, then use PE to generate 100B rows, something to keep it
loaded for a long time.
There is a specific sequence of events that have to happen. Everything is great
until the first time the sink regionserver exceeds the memstore limit. Within a
very short time from when we begin throwing RegionTooBusy exceptions on the
sink side the sink regionserver will crash. After restarting the sink
regionserver it will crash again. The only way to stop the crashing is for the
source side load to be stopped and sufficient restarts of the sink regionserver
to clear enough backlog.
This is some kind of multi factor problem involving load, particularly large
batched RPCs on the source side, and netty buffer allocation and management. I
cannot say for certain but feel pretty sure we are releasing a buffer too early
which is getting reused while someone else is still trying to do something with
it. In replication code, we act both as a server and client at the same time,
receiving the replication RPC as a server and then sending out the edits to the
local side as a client. Both netty client and server implementations use buffer
pools (by default) and so there is an opportunity to release() something too
early that normally is not an issue except when under high load. We crash with
the underlying buffer type being both on heap and direct buffer types, so it
isn't specific to a particular buffer implemenation, but may be common to
buffer management, and likely an hbase refcounting/release problem.
> RS SIGSEGV under heavy replication load
> ---------------------------------------
>
> Key: HBASE-28584
> URL: https://issues.apache.org/jira/browse/HBASE-28584
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 2.5.6
> Environment: RHEL 7.9
> JDK 11.0.23
> Hadoop 3.2.4
> Hbase 2.5.6
> Reporter: Whitney Jackson
> Assignee: Andrew Kyle Purtell
> Priority: Major
>
> I'm observing RS crashes under heavy replication load:
>
> {code:java}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00007f7546873b69, pid=29890, tid=36828
> #
> # JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.23+7) (build
> 11.0.23+7-LTS-222)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.23+7-LTS-222, mixed
> mode, tiered, compressed oops, g1 gc, linux-amd64)
> # Problematic frame:
> # J 24625 c2
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V
> (75 bytes) @ 0x00007f7546873b69 [0x00007f7546873960+0x0000000000000209]
> {code}
>
> The heavier load comes when a replication peer has been disabled for several
> hours for patching etc. When the peer is re-enabled the replication load is
> high until the peer is all caught up. The crashes happen on the cluster
> receiving the replication edits.
>
> I believe this problem started after upgrading from 2.4.x to 2.5.x.
>
> One possibly relevant non-standard config I run with:
> {code:java}
> <property>
> <name>hbase.region.store.parallel.put.limit</name>
> <!-- Default: 10 -->
> <value>100</value>
> <description>Added after seeing "failed to accept edits" replication errors
> in the destination region servers indicating this limit was being exceeded
> while trying to process replication edits.</description>
> </property>
> {code}
>
> I understand from other Jiras that the problem is likely around direct memory
> usage by Netty. I haven't yet tried switching the Netty allocator to
> {{unpooled}} or {{{}heap{}}}. I also haven't yet tried any of the
> {{io.netty.allocator.*}} options.
>
> {{MaxDirectMemorySize}} is set to 26g.
>
> Here's the full stack for the relevant thread:
>
> {code:java}
> Stack: [0x00007f72e2e5f000,0x00007f72e2f60000], sp=0x00007f72e2f5e450, free
> space=1021k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> J 24625 c2
> org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(Ljava/io/OutputStream;Ljava/nio/ByteBuffer;II)V
> (75 bytes) @ 0x00007f7546873b69 [0x00007f7546873960+0x0000000000000209]
> J 26253 c2
> org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I
> (21 bytes) @ 0x00007f7545af2d84 [0x00007f7545af2d20+0x0000000000000064]
> J 22971 c2
> org.apache.hadoop.hbase.codec.KeyValueCodecWithTags$KeyValueEncoder.write(Lorg/apache/hadoop/hbase/Cell;)V
> (27 bytes) @ 0x00007f754663f700 [0x00007f754663f4c0+0x0000000000000240]
> J 25251 c2
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
> (90 bytes) @ 0x00007f7546a53038 [0x00007f7546a50e60+0x00000000000021d8]
> J 21182 c2
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
> (73 bytes) @ 0x00007f7545f4d90c [0x00007f7545f4d3a0+0x000000000000056c]
> J 21181 c2
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
> (149 bytes) @ 0x00007f7545fd680c [0x00007f7545fd65e0+0x000000000000022c]
> J 25389 c2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$$Lambda$247.run()V
> (16 bytes) @ 0x00007f7546ade660 [0x00007f7546ade140+0x0000000000000520]
> J 24098 c2
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z
> (109 bytes) @ 0x00007f754678fbb8 [0x00007f754678f8e0+0x00000000000002d8]
> J 27297% c2
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (603
> bytes) @ 0x00007f75466c4d48 [0x00007f75466c4c80+0x00000000000000c8]
> j
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44
> j
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run()V+11
> j
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run()V+4
> J 12278 c1 java.lang.Thread.run()V [email protected] (17 bytes) @
> 0x00007f753e11f084 [0x00007f753e11ef40+0x0000000000000144]
> v ~StubRoutines::call_stub
> V [libjvm.so+0x85574a] JavaCalls::call_helper(JavaValue*, methodHandle
> const&, JavaCallArguments*, Thread*)+0x27a
> V [libjvm.so+0x853d2e] JavaCalls::call_virtual(JavaValue*, Handle, Klass*,
> Symbol*, Symbol*, Thread*)+0x19e
> V [libjvm.so+0x8ffddf] thread_entry(JavaThread*, Thread*)+0x9f
> V [libjvm.so+0xdb68d1] JavaThread::thread_main_inner()+0x131
> V [libjvm.so+0xdb2c4c] Thread::call_run()+0x13c
> V [libjvm.so+0xc1f2e6] thread_native_entry(Thread*)+0xe6
> {code}
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)