GlenGeng opened a new pull request #2292:
URL: https://github.com/apache/ozone/pull/2292
## What changes were proposed in this pull request?
After installSnapshot, the bootstrapped SCM crashed in a short time while
there is on-going write workload.
Clues from the core dump file, the new added SCM crashed in thread
StateMachineUpdater, while accessing RocksDB.
```
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fcefbb5fc0f, pid=1406, tid=0x00007fceecbcb700
#
# JRE version: OpenJDK Runtime Environment (8.0_232) (build 1.8.0_232-86)
# Java VM: OpenJDK 64-Bit Server VM (25.232-b86 mixed mode, sharing
linux-amd64 compressed oops)
# Problematic frame:
# C [librocksdbjni7209090472417999125.so+0x242c0f]
rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&,
rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf
#
# Core dump written. Default location: /root/core or core.1406
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------- T H R E A D ---------------
Current thread (0x00007fcf3ded2800): JavaThread
"7a85dabc-3f8c-47e1-bf0a-de75abe92820@group-691FBC3A273C-StateMachineUpdater"
daemon [_thread_in_native, id=1559,
stack(0x00007fceecacb000,0x00007fceecbcc000)]
siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
0x0000000000000000
```
```
Stack: [0x00007fceecacb000,0x00007fceecbcc000], sp=0x00007fceecbc96a0,
free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
C [librocksdbjni7209090472417999125.so+0x242c0f]
rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&,
rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf
C [librocksdbjni7209090472417999125.so+0x242ea2]
Java_org_rocksdb_RocksDB_get__J_3BIIJ+0x62
j org.rocksdb.RocksDB.get(J[BIIJ)[B+0
j org.rocksdb.RocksDB.get(Lorg/rocksdb/ColumnFamilyHandle;[B)[B+13
j org.apache.hadoop.hdds.utils.db.RDBTable.get([B)[B+9
j
org.apache.hadoop.hdds.utils.db.RDBTable.get(Ljava/lang/Object;)Ljava/lang/Object;+5
j
org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(Ljava/lang/Object;)Ljava/lang/Object;+14
j
org.apache.hadoop.hdds.utils.db.TypedTable.get(Ljava/lang/Object;)Ljava/lang/Object;+61
j
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.lambda$allocateBatch$0(Ljava/lang/String;)Ljava/lang/Long;+5
j
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl$$Lambda$444.apply(Ljava/lang/Object;)Ljava/lang/Object;+8
J 3481 C1
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Ljava/lang/Object;Ljava/util/function/Function;)Ljava/lang/Object;
(493 bytes) @ 0x00007fcf2daeb9e4 [0x00007fcf2daeb160+0x884]
j
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.allocateBatch(Ljava/lang/String;Ljava/lang/Long;Ljava/lang/Long;)Ljava/lang/Boolean;+11
v ~StubRoutines::call_stub
V [libjvm.so+0x682be8] JavaCalls::call_helper(JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0x1048
V [libjvm.so+0x9a9b49] Reflection::invoke(instanceKlassHandle,
methodHandle, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool,
Thread*)+0x599
V [libjvm.so+0x9ad7ed] Reflection::invoke_method(oopDesc*, Handle,
objArrayHandle, Thread*)+0x14d
V [libjvm.so+0x725a66] JVM_InvokeMethod+0x1e6
J 2759
sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
(0 bytes) @ 0x00007fcf2d2a827d [0x00007fcf2d2a8180+0xfd]
J 2758 C1
sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
(104 bytes) @ 0x00007fcf2d33f194 [0x00007fcf2d33dec0+0x12d4]
J 5190 C2
sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
(10 bytes) @ 0x00007fcf2dff1968 [0x00007fcf2dff1920+0x48]
j
java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56
j
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(Lorg/apache/hadoop/hdds/scm/ha/SCMRatisRequest;)Lorg/apache/ratis/protocol/Message;+68
j
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(Lorg/apache/ratis/statemachine/TransactionContext;)Ljava/util/concurrent/CompletableFuture;+27
j
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(Lorg/apache/ratis/proto/RaftProtos$LogEntryProto;)Ljava/util/concurrent/CompletableFuture;+126
j
org.apache.ratis.server.impl.StateMachineUpdater.applyLog()Lorg/apache/ratis/util/MemoizedSupplier;+142
j org.apache.ratis.server.impl.StateMachineUpdater.run()V+29
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
V [libjvm.so+0x682be8] JavaCalls::call_helper(JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0x1048
V [libjvm.so+0x684127] JavaCalls::call_virtual(JavaValue*, KlassHandle,
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2f7
V [libjvm.so+0x684660] JavaCalls::call_virtual(JavaValue*, Handle,
KlassHandle, Symbol*, Symbol*, Thread*)+0x60
V [libjvm.so+0x71c121] thread_entry(JavaThread*, Thread*)+0x91
V [libjvm.so+0xa8c671] JavaThread::thread_main_inner()+0xf1
V [libjvm.so+0x938f12] java_start(Thread*)+0x132
C [libpthread.so.0+0x7eb5] start_thread+0xc5
```
The root cause is missing reinitialize() in SequenceIdGenerator, thereby
after installing snapshot, SequenceIdGenerator holds a dangling reference to
the old removed RocksDB.
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5281
## How was this patch tested?
CI and internal integration test env inside tencent.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]