[
https://issues.apache.org/jira/browse/HDDS-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Glen Geng updated HDDS-5281:
----------------------------
Description:
After installSnapshot, the bootstrapped SCM crashed in a short time while there
is on-going write workload.
Clues from the core dump file, the new added SCM crashed in thread
StateMachineUpdater, while accessing RocksDB.
{code:java}
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fcefbb5fc0f, pid=1406, tid=0x00007fceecbcb700
#
# JRE version: OpenJDK Runtime Environment (8.0_232) (build 1.8.0_232-86)
# Java VM: OpenJDK 64-Bit Server VM (25.232-b86 mixed mode, sharing linux-amd64
compressed oops)
# Problematic frame:
# C [librocksdbjni7209090472417999125.so+0x242c0f]
rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&,
rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf
#
# Core dump written. Default location: /root/core or core.1406
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#--------------- T H R E A D ---------------Current thread
(0x00007fcf3ded2800): JavaThread
"7a85dabc-3f8c-47e1-bf0a-de75abe92820@group-691FBC3A273C-StateMachineUpdater"
daemon [_thread_in_native, id=1559,
stack(0x00007fceecacb000,0x00007fceecbcc000)]siginfo: si_signo: 11 (SIGSEGV),
si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000
{code}
{code:java}
Stack: [0x00007fceecacb000,0x00007fceecbcc000], sp=0x00007fceecbc96a0, free
space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [librocksdbjni7209090472417999125.so+0x242c0f] rocksdb_get_helper(JNIEnv_*,
rocksdb::DB*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*,
_jbyteArray*, int, int)+0xcf
C [librocksdbjni7209090472417999125.so+0x242ea2]
Java_org_rocksdb_RocksDB_get__J_3BIIJ+0x62
j org.rocksdb.RocksDB.get(J[BIIJ)[B+0
j org.rocksdb.RocksDB.get(Lorg/rocksdb/ColumnFamilyHandle;[B)[B+13
j org.apache.hadoop.hdds.utils.db.RDBTable.get([B)[B+9
j
org.apache.hadoop.hdds.utils.db.RDBTable.get(Ljava/lang/Object;)Ljava/lang/Object;+5
j
org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(Ljava/lang/Object;)Ljava/lang/Object;+14
j
org.apache.hadoop.hdds.utils.db.TypedTable.get(Ljava/lang/Object;)Ljava/lang/Object;+61
j
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.lambda$allocateBatch$0(Ljava/lang/String;)Ljava/lang/Long;+5
j
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl$$Lambda$444.apply(Ljava/lang/Object;)Ljava/lang/Object;+8
J 3481 C1
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Ljava/lang/Object;Ljava/util/function/Function;)Ljava/lang/Object;
(493 bytes) @ 0x00007fcf2daeb9e4 [0x00007fcf2daeb160+0x884]
j
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.allocateBatch(Ljava/lang/String;Ljava/lang/Long;Ljava/lang/Long;)Ljava/lang/Boolean;+11
v ~StubRoutines::call_stub
V [libjvm.so+0x682be8] JavaCalls::call_helper(JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0x1048
V [libjvm.so+0x9a9b49] Reflection::invoke(instanceKlassHandle, methodHandle,
Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*)+0x599
V [libjvm.so+0x9ad7ed] Reflection::invoke_method(oopDesc*, Handle,
objArrayHandle, Thread*)+0x14d
V [libjvm.so+0x725a66] JVM_InvokeMethod+0x1e6
J 2759
sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
(0 bytes) @ 0x00007fcf2d2a827d [0x00007fcf2d2a8180+0xfd]
J 2758 C1
sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
(104 bytes) @ 0x00007fcf2d33f194 [0x00007fcf2d33dec0+0x12d4]
J 5190 C2
sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
(10 bytes) @ 0x00007fcf2dff1968 [0x00007fcf2dff1920+0x48]
j
java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56
j
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(Lorg/apache/hadoop/hdds/scm/ha/SCMRatisRequest;)Lorg/apache/ratis/protocol/Message;+68
j
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(Lorg/apache/ratis/statemachine/TransactionContext;)Ljava/util/concurrent/CompletableFuture;+27
j
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(Lorg/apache/ratis/proto/RaftProtos$LogEntryProto;)Ljava/util/concurrent/CompletableFuture;+126
j
org.apache.ratis.server.impl.StateMachineUpdater.applyLog()Lorg/apache/ratis/util/MemoizedSupplier;+142
j org.apache.ratis.server.impl.StateMachineUpdater.run()V+29
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
V [libjvm.so+0x682be8] JavaCalls::call_helper(JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0x1048
V [libjvm.so+0x684127] JavaCalls::call_virtual(JavaValue*, KlassHandle,
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2f7
V [libjvm.so+0x684660] JavaCalls::call_virtual(JavaValue*, Handle,
KlassHandle, Symbol*, Symbol*, Thread*)+0x60
V [libjvm.so+0x71c121] thread_entry(JavaThread*, Thread*)+0x91
V [libjvm.so+0xa8c671] JavaThread::thread_main_inner()+0xf1
V [libjvm.so+0x938f12] java_start(Thread*)+0x132
C [libpthread.so.0+0x7eb5] start_thread+0xc5
{code}
The root cause is missing reinitialize() in SequenceIdGenerator, thereby after
installing snapshot, SequenceIdGenerator holds a dangling reference to the old
removed RocksDB.
was:
After installSnapshot, the bootstrapped SCM crashed in a short time while there
is on-going write workload.
After check the core dump file, the new added SCM crashed in thread
StateMachineUpdater, while accessing rocksdb.
{code:java}
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fcefbb5fc0f, pid=1406, tid=0x00007fceecbcb700
#
# JRE version: OpenJDK Runtime Environment (8.0_232) (build 1.8.0_232-86)
# Java VM: OpenJDK 64-Bit Server VM (25.232-b86 mixed mode, sharing linux-amd64
compressed oops)
# Problematic frame:
# C [librocksdbjni7209090472417999125.so+0x242c0f]
rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&,
rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf
#
# Core dump written. Default location: /root/core or core.1406
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#--------------- T H R E A D ---------------Current thread
(0x00007fcf3ded2800): JavaThread
"7a85dabc-3f8c-47e1-bf0a-de75abe92820@group-691FBC3A273C-StateMachineUpdater"
daemon [_thread_in_native, id=1559,
stack(0x00007fceecacb000,0x00007fceecbcc000)]siginfo: si_signo: 11 (SIGSEGV),
si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000
{code}
{code:java}
Stack: [0x00007fceecacb000,0x00007fceecbcc000], sp=0x00007fceecbc96a0, free
space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [librocksdbjni7209090472417999125.so+0x242c0f] rocksdb_get_helper(JNIEnv_*,
rocksdb::DB*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*,
_jbyteArray*, int, int)+0xcf
C [librocksdbjni7209090472417999125.so+0x242ea2]
Java_org_rocksdb_RocksDB_get__J_3BIIJ+0x62
j org.rocksdb.RocksDB.get(J[BIIJ)[B+0
j org.rocksdb.RocksDB.get(Lorg/rocksdb/ColumnFamilyHandle;[B)[B+13
j org.apache.hadoop.hdds.utils.db.RDBTable.get([B)[B+9
j
org.apache.hadoop.hdds.utils.db.RDBTable.get(Ljava/lang/Object;)Ljava/lang/Object;+5
j
org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(Ljava/lang/Object;)Ljava/lang/Object;+14
j
org.apache.hadoop.hdds.utils.db.TypedTable.get(Ljava/lang/Object;)Ljava/lang/Object;+61
j
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.lambda$allocateBatch$0(Ljava/lang/String;)Ljava/lang/Long;+5
j
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl$$Lambda$444.apply(Ljava/lang/Object;)Ljava/lang/Object;+8
J 3481 C1
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Ljava/lang/Object;Ljava/util/function/Function;)Ljava/lang/Object;
(493 bytes) @ 0x00007fcf2daeb9e4 [0x00007fcf2daeb160+0x884]
j
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.allocateBatch(Ljava/lang/String;Ljava/lang/Long;Ljava/lang/Long;)Ljava/lang/Boolean;+11
v ~StubRoutines::call_stub
V [libjvm.so+0x682be8] JavaCalls::call_helper(JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0x1048
V [libjvm.so+0x9a9b49] Reflection::invoke(instanceKlassHandle, methodHandle,
Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, Thread*)+0x599
V [libjvm.so+0x9ad7ed] Reflection::invoke_method(oopDesc*, Handle,
objArrayHandle, Thread*)+0x14d
V [libjvm.so+0x725a66] JVM_InvokeMethod+0x1e6
J 2759
sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
(0 bytes) @ 0x00007fcf2d2a827d [0x00007fcf2d2a8180+0xfd]
J 2758 C1
sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
(104 bytes) @ 0x00007fcf2d33f194 [0x00007fcf2d33dec0+0x12d4]
J 5190 C2
sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
(10 bytes) @ 0x00007fcf2dff1968 [0x00007fcf2dff1920+0x48]
j
java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56
j
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(Lorg/apache/hadoop/hdds/scm/ha/SCMRatisRequest;)Lorg/apache/ratis/protocol/Message;+68
j
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(Lorg/apache/ratis/statemachine/TransactionContext;)Ljava/util/concurrent/CompletableFuture;+27
j
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(Lorg/apache/ratis/proto/RaftProtos$LogEntryProto;)Ljava/util/concurrent/CompletableFuture;+126
j
org.apache.ratis.server.impl.StateMachineUpdater.applyLog()Lorg/apache/ratis/util/MemoizedSupplier;+142
j org.apache.ratis.server.impl.StateMachineUpdater.run()V+29
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
V [libjvm.so+0x682be8] JavaCalls::call_helper(JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0x1048
V [libjvm.so+0x684127] JavaCalls::call_virtual(JavaValue*, KlassHandle,
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2f7
V [libjvm.so+0x684660] JavaCalls::call_virtual(JavaValue*, Handle,
KlassHandle, Symbol*, Symbol*, Thread*)+0x60
V [libjvm.so+0x71c121] thread_entry(JavaThread*, Thread*)+0x91
V [libjvm.so+0xa8c671] JavaThread::thread_main_inner()+0xf1
V [libjvm.so+0x938f12] java_start(Thread*)+0x132
C [libpthread.so.0+0x7eb5] start_thread+0xc5
{code}
It is due to missing reinitialize in SequenceIdGenerator, thereby after
installing snapshot, SequenceIdGenerator holds a dangling reference to the old
removed rocksdb.
> Add reinitialize() for SequenceIdGenerator.
> -------------------------------------------
>
> Key: HDDS-5281
> URL: https://issues.apache.org/jira/browse/HDDS-5281
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: SCM HA
> Affects Versions: 1.2.0
> Reporter: Glen Geng
> Assignee: Glen Geng
> Priority: Major
> Fix For: 1.2.0
>
>
> After installSnapshot, the bootstrapped SCM crashed in a short time while
> there is on-going write workload.
>
> Clues from the core dump file, the new added SCM crashed in thread
> StateMachineUpdater, while accessing RocksDB.
> {code:java}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00007fcefbb5fc0f, pid=1406, tid=0x00007fceecbcb700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_232) (build 1.8.0_232-86)
> # Java VM: OpenJDK 64-Bit Server VM (25.232-b86 mixed mode, sharing
> linux-amd64 compressed oops)
> # Problematic frame:
> # C [librocksdbjni7209090472417999125.so+0x242c0f]
> rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&,
> rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf
> #
> # Core dump written. Default location: /root/core or core.1406
> #
> # If you would like to submit a bug report, please visit:
> # http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #--------------- T H R E A D ---------------Current thread
> (0x00007fcf3ded2800): JavaThread
> "7a85dabc-3f8c-47e1-bf0a-de75abe92820@group-691FBC3A273C-StateMachineUpdater"
> daemon [_thread_in_native, id=1559,
> stack(0x00007fceecacb000,0x00007fceecbcc000)]siginfo: si_signo: 11 (SIGSEGV),
> si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000
> {code}
>
> {code:java}
> Stack: [0x00007fceecacb000,0x00007fceecbcc000], sp=0x00007fceecbc96a0, free
> space=1017k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> C [librocksdbjni7209090472417999125.so+0x242c0f]
> rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&,
> rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf
> C [librocksdbjni7209090472417999125.so+0x242ea2]
> Java_org_rocksdb_RocksDB_get__J_3BIIJ+0x62
> j org.rocksdb.RocksDB.get(J[BIIJ)[B+0
> j org.rocksdb.RocksDB.get(Lorg/rocksdb/ColumnFamilyHandle;[B)[B+13
> j org.apache.hadoop.hdds.utils.db.RDBTable.get([B)[B+9
> j
> org.apache.hadoop.hdds.utils.db.RDBTable.get(Ljava/lang/Object;)Ljava/lang/Object;+5
> j
> org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(Ljava/lang/Object;)Ljava/lang/Object;+14
> j
> org.apache.hadoop.hdds.utils.db.TypedTable.get(Ljava/lang/Object;)Ljava/lang/Object;+61
> j
> org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.lambda$allocateBatch$0(Ljava/lang/String;)Ljava/lang/Long;+5
> j
> org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl$$Lambda$444.apply(Ljava/lang/Object;)Ljava/lang/Object;+8
> J 3481 C1
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Ljava/lang/Object;Ljava/util/function/Function;)Ljava/lang/Object;
> (493 bytes) @ 0x00007fcf2daeb9e4 [0x00007fcf2daeb160+0x884]
> j
> org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.allocateBatch(Ljava/lang/String;Ljava/lang/Long;Ljava/lang/Long;)Ljava/lang/Boolean;+11
> v ~StubRoutines::call_stub
> V [libjvm.so+0x682be8] JavaCalls::call_helper(JavaValue*, methodHandle*,
> JavaCallArguments*, Thread*)+0x1048
> V [libjvm.so+0x9a9b49] Reflection::invoke(instanceKlassHandle,
> methodHandle, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool,
> Thread*)+0x599
> V [libjvm.so+0x9ad7ed] Reflection::invoke_method(oopDesc*, Handle,
> objArrayHandle, Thread*)+0x14d
> V [libjvm.so+0x725a66] JVM_InvokeMethod+0x1e6
> J 2759
> sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
> (0 bytes) @ 0x00007fcf2d2a827d [0x00007fcf2d2a8180+0xfd]
> J 2758 C1
> sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
> (104 bytes) @ 0x00007fcf2d33f194 [0x00007fcf2d33dec0+0x12d4]
> J 5190 C2
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
> (10 bytes) @ 0x00007fcf2dff1968 [0x00007fcf2dff1920+0x48]
> j
> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56
> j
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(Lorg/apache/hadoop/hdds/scm/ha/SCMRatisRequest;)Lorg/apache/ratis/protocol/Message;+68
> j
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(Lorg/apache/ratis/statemachine/TransactionContext;)Ljava/util/concurrent/CompletableFuture;+27
> j
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(Lorg/apache/ratis/proto/RaftProtos$LogEntryProto;)Ljava/util/concurrent/CompletableFuture;+126
> j
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog()Lorg/apache/ratis/util/MemoizedSupplier;+142
> j org.apache.ratis.server.impl.StateMachineUpdater.run()V+29
> j java.lang.Thread.run()V+11
> v ~StubRoutines::call_stub
> V [libjvm.so+0x682be8] JavaCalls::call_helper(JavaValue*, methodHandle*,
> JavaCallArguments*, Thread*)+0x1048
> V [libjvm.so+0x684127] JavaCalls::call_virtual(JavaValue*, KlassHandle,
> Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2f7
> V [libjvm.so+0x684660] JavaCalls::call_virtual(JavaValue*, Handle,
> KlassHandle, Symbol*, Symbol*, Thread*)+0x60
> V [libjvm.so+0x71c121] thread_entry(JavaThread*, Thread*)+0x91
> V [libjvm.so+0xa8c671] JavaThread::thread_main_inner()+0xf1
> V [libjvm.so+0x938f12] java_start(Thread*)+0x132
> C [libpthread.so.0+0x7eb5] start_thread+0xc5
> {code}
>
> The root cause is missing reinitialize() in SequenceIdGenerator, thereby
> after installing snapshot, SequenceIdGenerator holds a dangling reference to
> the old removed RocksDB.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]