GlenGeng opened a new pull request #2292:
URL: https://github.com/apache/ozone/pull/2292


   ## What changes were proposed in this pull request?
   
   After installSnapshot, the bootstrapped SCM crashed in a short time while 
there is on-going write workload.
    
   Clues from the core dump file, the new added SCM crashed in thread 
StateMachineUpdater, while accessing RocksDB. 
   
   ```
   #
   # A fatal error has been detected by the Java Runtime Environment:
   #
   #  SIGSEGV (0xb) at pc=0x00007fcefbb5fc0f, pid=1406, tid=0x00007fceecbcb700
   #
   # JRE version: OpenJDK Runtime Environment (8.0_232) (build 1.8.0_232-86)
   # Java VM: OpenJDK 64-Bit Server VM (25.232-b86 mixed mode, sharing 
linux-amd64 compressed oops)
   # Problematic frame:
   # C  [librocksdbjni7209090472417999125.so+0x242c0f]  
rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&, 
rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf
   #
   # Core dump written. Default location: /root/core or core.1406
   #
   # If you would like to submit a bug report, please visit:
   #   http://bugreport.java.com/bugreport/crash.jsp
   # The crash happened outside the Java Virtual Machine in native code.
   # See problematic frame for where to report the bug.
   #
   
   ---------------  T H R E A D  ---------------
   
   Current thread (0x00007fcf3ded2800):  JavaThread 
"7a85dabc-3f8c-47e1-bf0a-de75abe92820@group-691FBC3A273C-StateMachineUpdater" 
daemon [_thread_in_native, id=1559, 
stack(0x00007fceecacb000,0x00007fceecbcc000)]
   
   siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 
0x0000000000000000
   ```
   
   ```
   Stack: [0x00007fceecacb000,0x00007fceecbcc000],  sp=0x00007fceecbc96a0,  
free space=1017k
   Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
code)
   C  [librocksdbjni7209090472417999125.so+0x242c0f]  
rocksdb_get_helper(JNIEnv_*, rocksdb::DB*, rocksdb::ReadOptions const&, 
rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0xcf
   C  [librocksdbjni7209090472417999125.so+0x242ea2]  
Java_org_rocksdb_RocksDB_get__J_3BIIJ+0x62
   j  org.rocksdb.RocksDB.get(J[BIIJ)[B+0
   j  org.rocksdb.RocksDB.get(Lorg/rocksdb/ColumnFamilyHandle;[B)[B+13
   j  org.apache.hadoop.hdds.utils.db.RDBTable.get([B)[B+9
   j  
org.apache.hadoop.hdds.utils.db.RDBTable.get(Ljava/lang/Object;)Ljava/lang/Object;+5
   j  
org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(Ljava/lang/Object;)Ljava/lang/Object;+14
   j  
org.apache.hadoop.hdds.utils.db.TypedTable.get(Ljava/lang/Object;)Ljava/lang/Object;+61
   j  
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.lambda$allocateBatch$0(Ljava/lang/String;)Ljava/lang/Long;+5
   j  
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl$$Lambda$444.apply(Ljava/lang/Object;)Ljava/lang/Object;+8
   J 3481 C1 
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Ljava/lang/Object;Ljava/util/function/Function;)Ljava/lang/Object;
 (493 bytes) @ 0x00007fcf2daeb9e4 [0x00007fcf2daeb160+0x884]
   j  
org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator$StateManagerImpl.allocateBatch(Ljava/lang/String;Ljava/lang/Long;Ljava/lang/Long;)Ljava/lang/Boolean;+11
   v  ~StubRoutines::call_stub
   V  [libjvm.so+0x682be8]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
JavaCallArguments*, Thread*)+0x1048
   V  [libjvm.so+0x9a9b49]  Reflection::invoke(instanceKlassHandle, 
methodHandle, Handle, bool, objArrayHandle, BasicType, objArrayHandle, bool, 
Thread*)+0x599
   V  [libjvm.so+0x9ad7ed]  Reflection::invoke_method(oopDesc*, Handle, 
objArrayHandle, Thread*)+0x14d
   V  [libjvm.so+0x725a66]  JVM_InvokeMethod+0x1e6
   J 2759  
sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
 (0 bytes) @ 0x00007fcf2d2a827d [0x00007fcf2d2a8180+0xfd]
   J 2758 C1 
sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
 (104 bytes) @ 0x00007fcf2d33f194 [0x00007fcf2d33dec0+0x12d4]
   J 5190 C2 
sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
 (10 bytes) @ 0x00007fcf2dff1968 [0x00007fcf2dff1920+0x48]
   j  
java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56
   j  
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(Lorg/apache/hadoop/hdds/scm/ha/SCMRatisRequest;)Lorg/apache/ratis/protocol/Message;+68
   j  
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(Lorg/apache/ratis/statemachine/TransactionContext;)Ljava/util/concurrent/CompletableFuture;+27
   j  
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(Lorg/apache/ratis/proto/RaftProtos$LogEntryProto;)Ljava/util/concurrent/CompletableFuture;+126
   j  
org.apache.ratis.server.impl.StateMachineUpdater.applyLog()Lorg/apache/ratis/util/MemoizedSupplier;+142
   j  org.apache.ratis.server.impl.StateMachineUpdater.run()V+29
   j  java.lang.Thread.run()V+11
   v  ~StubRoutines::call_stub
   V  [libjvm.so+0x682be8]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
JavaCallArguments*, Thread*)+0x1048
   V  [libjvm.so+0x684127]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2f7
   V  [libjvm.so+0x684660]  JavaCalls::call_virtual(JavaValue*, Handle, 
KlassHandle, Symbol*, Symbol*, Thread*)+0x60
   V  [libjvm.so+0x71c121]  thread_entry(JavaThread*, Thread*)+0x91
   V  [libjvm.so+0xa8c671]  JavaThread::thread_main_inner()+0xf1
   V  [libjvm.so+0x938f12]  java_start(Thread*)+0x132
   C  [libpthread.so.0+0x7eb5]  start_thread+0xc5
   ```
   
   The root cause is missing reinitialize() in SequenceIdGenerator, thereby 
after installing snapshot, SequenceIdGenerator holds a dangling reference to 
the old removed RocksDB.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5281
   
   ## How was this patch tested?
   
   CI and internal integration test env inside tencent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to