[ https://issues.apache.org/jira/browse/HBASE-26062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385232#comment-17385232 ]
Anoop Sam John commented on HBASE-26062: ---------------------------------------- [~stack] no ASYNC_WAL durability? If so we have other issues !!! > SIGSEGV in AsyncFSWAL consume > ----------------------------- > > Key: HBASE-26062 > URL: https://issues.apache.org/jira/browse/HBASE-26062 > Project: HBase > Issue Type: Bug > Reporter: Michael Stack > Priority: Major > > Seems related to the parent issue. Its happened a few times on one of our > clusters here. Below are two examples. Need more detail but perhaps the call > has timed out, the buffer has thus been freed, but the late consume on the > other side of the ringbuffer doesn't know that and goes ahead (Just > speculation). > > {code:java} > # SIGSEGV (0xb) at pc=0x00007f8b3ef5b77c, pid=37631, tid=0x00007f61560ed700 > RAX=0x00000000ffffdf6e is an unknown valueRBX=0x00007f8a38d7b6f8 is an > oopjava.nio.DirectByteBuffer - klass: > 'java/nio/DirectByteBuffer'RCX=0x00007f60e2767898 is pointing into > metadataRDX=0x0000000000000de7 is an unknown valueRSP=0x00007f61560ec6f0 is > pointing into the stack for thread: 0x00007f8b3017b800RBP=[error occurred > during error reporting (printing register info), id 0xb] > Stack: [0x00007f6155fed000,0x00007f61560ee000], sp=0x00007f61560ec6f0, free > space=1021kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, > C=native code)J 23901 C2 > java.util.stream.MatchOps$1MatchSink.accept(Ljava/lang/Object;)V (44 bytes) @ > 0x00007f8b3ef5b77c [0x00007f8b3ef5b640+0x13c]J 16165 C2 > java.util.ArrayList$ArrayListSpliterator.tryAdvance(Ljava/util/function/Consumer;)Z > (79 bytes) @ 0x00007f8b3d67b344 [0x00007f8b3d67b2c0+0x84]J 16160 C2 > java.util.stream.MatchOps$MatchOp.evaluateSequential(Ljava/util/stream/PipelineHelper;Ljava/util/Spliterator;)Ljava/lang/Object; > (7 bytes) @ 0x00007f8b3d67bc9c [0x00007f8b3d67b900+0x39c]J 17729 C2 > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALActionListener.visitLogEntryBeforeWrite(Lorg/apache/hadoop/hbase/wal/WALKey;Lorg/apache/hadoop/hbase/wal/WALEdit;)V > (10 bytes) @ 0x00007f8b3fc39010 [0x00007f8b3fc388a0+0x770]J 29991 C2 > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.appendAndSync()V (261 > bytes) @ 0x00007f8b3fd03d90 [0x00007f8b3fd039e0+0x3b0]J 20773 C2 > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume()V (474 bytes) @ > 0x00007f8b40283728 [0x00007f8b40283480+0x2a8]J 15191 C2 > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL$$Lambda$76.run()V (8 > bytes) @ 0x00007f8b3ed69ecc [0x00007f8b3ed69ea0+0x2c]J 17383% C2 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > (225 bytes) @ 0x00007f8b3d9423f8 [0x00007f8b3d942260+0x198]j > java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5j > java.lang.Thread.run()V+11v ~StubRoutines::call_stubV [libjvm.so+0x66b9ba] > JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, > Thread*)+0xe1aV [libjvm.so+0x669073] JavaCalls::call_virtual(JavaValue*, > KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x263V > [libjvm.so+0x669647] JavaCalls::call_virtual(JavaValue*, Handle, > KlassHandle, Symbol*, Symbol*, Thread*)+0x57V [libjvm.so+0x6aaa4c] > thread_entry(JavaThread*, Thread*)+0x6cV [libjvm.so+0xa224cb] > JavaThread::thread_main_inner()+0xdbV [libjvm.so+0xa22816] > JavaThread::run()+0x316V [libjvm.so+0x8c4202] java_start(Thread*)+0x102C > [libpthread.so.0+0x76ba] start_thread+0xca {code} > > This one is from a month previous and has a deeper stack... we're trying to > read a Cell... > > {code:java} > Stack: [0x00007fa1d5fb8000,0x00007fa1d60b9000], sp=0x00007fa1d60b7660, free > space=1021kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, > C=native code)J 30665 C2 > org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(Lorg/apache/hadoop/hbase/Cell;[BII)Z > (59 bytes) @ 0x00007fcc2d29eeb2 [0x00007fcc2d29e7c0+0x6f2]J 25816 C2 > org.apache.hadoop.hbase.CellUtil.matchingFamily(Lorg/apache/hadoop/hbase/Cell;[B)Z > (28 bytes) @ 0x00007fcc2a0430f8 [0x00007fcc2a0430e0+0x18]J 17236 C2 > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALActionListener$$Lambda$254.test(Ljava/lang/Object;)Z > (8 bytes) @ 0x00007fcc2b40bc68 [0x00007fcc2b40bc20+0x48]J 13735 C2 > java.util.ArrayList$ArrayListSpliterator.tryAdvance(Ljava/util/function/Consumer;)Z > (79 bytes) @ 0x00007fcc2b7d936c [0x00007fcc2b7d92c0+0xac]J 17162 C2 > java.util.stream.MatchOps$MatchOp.evaluateSequential(Ljava/util/stream/PipelineHelper;Ljava/util/Spliterator;)Ljava/lang/Object; > (7 bytes) @ 0x00007fcc29bc05e8 [0x00007fcc29bbfe80+0x768]J 16934 C2 > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALActionListener.visitLogEntryBeforeWrite(Lorg/apache/hadoop/hbase/wal/WALKey;Lorg/apache/hadoop/hbase/wal/WALEdit;)V > (10 bytes) @ 0x00007fcc2bb313f8 [0x00007fcc2bb30c60+0x798]J 30732 C2 > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.appendAndSync()V (261 > bytes) @ 0x00007fcc2ae5a420 [0x00007fcc2ae59d60+0x6c0]J 22203 C2 > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.consume()V (474 bytes) @ > 0x00007fcc2a987420 [0x00007fcc2a987200+0x220]J 16857 C2 > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL$$Lambda$126.run()V (8 > bytes) @ 0x00007fcc2b0bf28c [0x00007fcc2b0bf260+0x2c]J 13721% C2 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > (225 bytes) @ 0x00007fcc2b7d77c0 [0x00007fcc2b7d7240+0x580]j > java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5j > java.lang.Thread.run()V+11v ~StubRoutines::call_stub {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)