[ 
https://issues.apache.org/jira/browse/HDDS-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-15314:
------------------------------
    Description: 
During snapshot defrag scale testing, Ozone Manager crashed in native RocksDB 
JNI code while the Hadoop Metrics2 timer was collecting generic RocksDB DB 
properties. The crash happened twice in the same setup.

{code:title=1st crash}
Stack: [0x00007f60b6f10000,0x00007f60b7011000],  sp=0x00007f60b700f378,  free 
space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [librocksdbjni-linux64.so+0x47b48b]  
rocksdb::InternalStats::HandleEstimatePendingCompactionBytes(unsigned long*, 
rocksdb::DBImpl*, rocksdb::Version*)+0xb
C  [librocksdbjni-linux64.so+0x3c3da4]  
rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice 
const&, std::string*)+0x84
C  [librocksdbjni-linux64.so+0x2adf9d]  
Java_org_rocksdb_RocksDB_getProperty+0x14d
J 5382  
org.rocksdb.RocksDB.getProperty(JJLjava/lang/String;I)Ljava/lang/String; (0 
bytes) @ 0x00007f60d0f1ed06 [0x00007f60d0f1ec40+0xc6]
J 11916 C2 
org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getDBPropertyData(Lorg/apache/hadoop/metrics2/MetricsRecordBuilder;)V
 (280 bytes) @ 0x00007f60d21bf044 [0x00007f60d21bdf60+0x10e4]
J 12024 C1 
org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getMetrics(Lorg/apache/hadoop/metrics2/MetricsCollector;Z)V
 (32 bytes) @ 0x00007f60d1525b14 [0x00007f60d1525220+0x8f4]
J 11166 C2 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsCollectorImpl;Z)Ljava/lang/Iterable;
 (139 bytes) @ 0x00007f60d293785c [0x00007f60d29377e0+0x7c]
J 13473 C2 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsSourceAdapter;Lorg/apache/hadoop/metrics2/impl/MetricsBufferBuilder;)V
 (72 bytes) @ 0x00007f60d2f51e30 [0x00007f60d2f51da0+0x90]
J 13273 C1 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics()Lorg/apache/hadoop/metrics2/impl/MetricsBuffer;
 (115 bytes) @ 0x00007f60d2ec1164 [0x00007f60d2ec04e0+0xc84]
J 15901 C1 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run()V (23 
bytes) @ 0x00007f60d1d8f20c [0x00007f60d1d8ef60+0x2ac]
j  java.util.TimerThread.mainLoop()V+221
j  java.util.TimerThread.run()V+1
{code}

{code:title=2nd crash}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f0bee21ed7d, pid=711089, tid=0x00007f0be71c0700
#
# JRE version: OpenJDK Runtime Environment (8.0_232-b09) (build 1.8.0_232-b09)
# Java VM: OpenJDK 64-Bit Server VM (25.232-b09 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# C  [librocksdbjni-linux64.so+0x3c3d7d]  
rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice 
const&, std::string*)+0x5d
...
{code}

DB metrics should not have been enabled for defrag DBs in the first place. And 
previously it had been disabled for snapshot DBs in HDDS-12193 
(https://github.com/apache/ozone/commit/ad0debf5e1b) by default. Similar 
measure needs to be taken for defrag DBs as well.

  was:
During snapshot defrag scale testing, Ozone Manager crashed in native RocksDB 
JNI code while the Hadoop Metrics2 timer was collecting generic RocksDB DB 
properties. The crash happened twice in the same setup.

{code:title=First crash}
Stack: [0x00007f60b6f10000,0x00007f60b7011000],  sp=0x00007f60b700f378,  free 
space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [librocksdbjni-linux64.so+0x47b48b]  
rocksdb::InternalStats::HandleEstimatePendingCompactionBytes(unsigned long*, 
rocksdb::DBImpl*, rocksdb::Version*)+0xb
C  [librocksdbjni-linux64.so+0x3c3da4]  
rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice 
const&, std::string*)+0x84
C  [librocksdbjni-linux64.so+0x2adf9d]  
Java_org_rocksdb_RocksDB_getProperty+0x14d
J 5382  
org.rocksdb.RocksDB.getProperty(JJLjava/lang/String;I)Ljava/lang/String; (0 
bytes) @ 0x00007f60d0f1ed06 [0x00007f60d0f1ec40+0xc6]
J 11916 C2 
org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getDBPropertyData(Lorg/apache/hadoop/metrics2/MetricsRecordBuilder;)V
 (280 bytes) @ 0x00007f60d21bf044 [0x00007f60d21bdf60+0x10e4]
J 12024 C1 
org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getMetrics(Lorg/apache/hadoop/metrics2/MetricsCollector;Z)V
 (32 bytes) @ 0x00007f60d1525b14 [0x00007f60d1525220+0x8f4]
J 11166 C2 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsCollectorImpl;Z)Ljava/lang/Iterable;
 (139 bytes) @ 0x00007f60d293785c [0x00007f60d29377e0+0x7c]
J 13473 C2 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsSourceAdapter;Lorg/apache/hadoop/metrics2/impl/MetricsBufferBuilder;)V
 (72 bytes) @ 0x00007f60d2f51e30 [0x00007f60d2f51da0+0x90]
J 13273 C1 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics()Lorg/apache/hadoop/metrics2/impl/MetricsBuffer;
 (115 bytes) @ 0x00007f60d2ec1164 [0x00007f60d2ec04e0+0xc84]
J 15901 C1 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run()V (23 
bytes) @ 0x00007f60d1d8f20c [0x00007f60d1d8ef60+0x2ac]
j  java.util.TimerThread.mainLoop()V+221
j  java.util.TimerThread.run()V+1
{code}

{code}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f0bee21ed7d, pid=711089, tid=0x00007f0be71c0700
#
# JRE version: OpenJDK Runtime Environment (8.0_232-b09) (build 1.8.0_232-b09)
# Java VM: OpenJDK 64-Bit Server VM (25.232-b09 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# C  [librocksdbjni-linux64.so+0x3c3d7d]  
rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice 
const&, std::string*)+0x5d
...
{code}

DB metrics should not have been enabled for defrag DBs in the first place. And 
previously it had been disabled for snapshot DBs in HDDS-12193 
(https://github.com/apache/ozone/commit/ad0debf5e1b) by default. Similar 
measure needs to be taken for defrag DBs as well.


> Disable defrag DB metrics due to crash during snapshot defrag
> -------------------------------------------------------------
>
>                 Key: HDDS-15314
>                 URL: https://issues.apache.org/jira/browse/HDDS-15314
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: Ozone Manager
>            Reporter: Siyao Meng
>            Assignee: Siyao Meng
>            Priority: Blocker
>
> During snapshot defrag scale testing, Ozone Manager crashed in native RocksDB 
> JNI code while the Hadoop Metrics2 timer was collecting generic RocksDB DB 
> properties. The crash happened twice in the same setup.
> {code:title=1st crash}
> Stack: [0x00007f60b6f10000,0x00007f60b7011000],  sp=0x00007f60b700f378,  free 
> space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [librocksdbjni-linux64.so+0x47b48b]  
> rocksdb::InternalStats::HandleEstimatePendingCompactionBytes(unsigned long*, 
> rocksdb::DBImpl*, rocksdb::Version*)+0xb
> C  [librocksdbjni-linux64.so+0x3c3da4]  
> rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice 
> const&, std::string*)+0x84
> C  [librocksdbjni-linux64.so+0x2adf9d]  
> Java_org_rocksdb_RocksDB_getProperty+0x14d
> J 5382  
> org.rocksdb.RocksDB.getProperty(JJLjava/lang/String;I)Ljava/lang/String; (0 
> bytes) @ 0x00007f60d0f1ed06 [0x00007f60d0f1ec40+0xc6]
> J 11916 C2 
> org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getDBPropertyData(Lorg/apache/hadoop/metrics2/MetricsRecordBuilder;)V
>  (280 bytes) @ 0x00007f60d21bf044 [0x00007f60d21bdf60+0x10e4]
> J 12024 C1 
> org.apache.hadoop.hdds.utils.RocksDBStoreMetrics.getMetrics(Lorg/apache/hadoop/metrics2/MetricsCollector;Z)V
>  (32 bytes) @ 0x00007f60d1525b14 [0x00007f60d1525220+0x8f4]
> J 11166 C2 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsCollectorImpl;Z)Ljava/lang/Iterable;
>  (139 bytes) @ 0x00007f60d293785c [0x00007f60d29377e0+0x7c]
> J 13473 C2 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(Lorg/apache/hadoop/metrics2/impl/MetricsSourceAdapter;Lorg/apache/hadoop/metrics2/impl/MetricsBufferBuilder;)V
>  (72 bytes) @ 0x00007f60d2f51e30 [0x00007f60d2f51da0+0x90]
> J 13273 C1 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics()Lorg/apache/hadoop/metrics2/impl/MetricsBuffer;
>  (115 bytes) @ 0x00007f60d2ec1164 [0x00007f60d2ec04e0+0xc84]
> J 15901 C1 org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run()V (23 
> bytes) @ 0x00007f60d1d8f20c [0x00007f60d1d8ef60+0x2ac]
> j  java.util.TimerThread.mainLoop()V+221
> j  java.util.TimerThread.run()V+1
> {code}
> {code:title=2nd crash}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007f0bee21ed7d, pid=711089, tid=0x00007f0be71c0700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_232-b09) (build 1.8.0_232-b09)
> # Java VM: OpenJDK 64-Bit Server VM (25.232-b09 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # C  [librocksdbjni-linux64.so+0x3c3d7d]  
> rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice 
> const&, std::string*)+0x5d
> ...
> {code}
> DB metrics should not have been enabled for defrag DBs in the first place. 
> And previously it had been disabled for snapshot DBs in HDDS-12193 
> (https://github.com/apache/ozone/commit/ad0debf5e1b) by default. Similar 
> measure needs to be taken for defrag DBs as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to