[
https://issues.apache.org/jira/browse/HDDS-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712902#comment-17712902
]
Ivan Andika edited comment on HDDS-8271 at 4/17/23 6:12 AM:
------------------------------------------------------------
[~wanghongbing] Is it possible to get the exact line number using the addr2line
command from the thread dump line? (from
[https://github.com/facebook/rocksdb/wiki/JNI-Debugging])
{code:java}
addr2line -e java/target/librocksjni-linux64.so <ADDRESS>{code}
Our cluster also encountered similar error, but I was not able to locate the
librocksjni-linux64.so anywhere. Maybe someone has an idea?
{code:java}
Stack: [0x00007fb19c088000,0x00007fb19c189000], sp=0x00007fb19c186978, free
space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [librocksdbjni5488597800197898859.so+0x3b2314]
rocksdb::DBImpl::GetLatestSequenceNumber() const+0x4
j org.rocksdb.RocksDB.getLatestSequenceNumber()J+5
j org.apache.hadoop.hdds.utils.db.RocksDatabase.getLatestSequenceNumber()J+18
j
org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(JJ)Lorg/apache/hadoop/hdds/utils/db/DBUpdatesWrapper;+321
j
org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/om/helpers/DBUpdates;+30
j
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesResponse;+9
J 25168 C2
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse;
(732 bytes) @ 0x00007fc3909159c8 [0x00007fc390911160+0x4868]
J 25148 C2
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse;
(45 bytes) @ 0x00007fc3908b12fc [0x00007fc3908b0a60+0x89c]
J 25166 C2
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB$$Lambda$374.apply(Ljava/lang/Object;)Ljava/lang/Object;
(12 bytes) @ 0x00007fc3908d1a6c [0x00007fc3908d19a0+0xcc]
J 10275 C2
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(Ljava/lang/Object;Lorg/apache/hadoop/hdds/function/FunctionWithServiceException;Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;
(205 bytes) @ 0x00007fc38dc17d88 [0x00007fc38dc17b60+0x228]
J 12527 C2
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Lorg/apache/hadoop/ipc/RPC$Server;Ljava/lang/String;Lorg/apache/hadoop/io/Writable;J)Lorg/apache/hadoop/io/Writable;
(370 bytes) @ 0x00007fc38ecaae68 [0x00007fc38eca9ce0+0x1188]
J 25025 C2 org.apache.hadoop.ipc.Server$RpcCall.run()Ljava/lang/Void; (305
bytes) @ 0x00007fc3907f4264 [0x00007fc3907f4000+0x264]
J 10112 C2 org.apache.hadoop.ipc.Server$RpcCall.run()Ljava/lang/Object; (5
bytes) @ 0x00007fc38ecbc244 [0x00007fc38ecbc220+0x24]
v ~StubRoutines::call_stub
V [libjvm.so+0x67dd1e] JavaCalls::call_helper(JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0xf5e
V [libjvm.so+0x6d0c5d] JVM_DoPrivileged+0x58d
J 154
java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;
(0 bytes) @ 0x00007fc38d41e195 [0x00007fc38d41e0c0+0xd5]
J 10266 C2
org.apache.hadoop.security.UserGroupInformation.doAs(Ljava/security/PrivilegedExceptionAction;)Ljava/lang/Object;
(173 bytes) @ 0x00007fc38ed4fc04 [0x00007fc38ed4f980+0x284]
J 17230% C2 org.apache.hadoop.ipc.Server$Handler.run()V (794 bytes) @
0x00007fc38eef3e1c [0x00007fc38eef3300+0xb1c]
v ~StubRoutines::call_stub
V [libjvm.so+0x67dd1e] JavaCalls::call_helper(JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0xf5e
V [libjvm.so+0x67b0e4] JavaCalls::call_virtual(JavaValue*, KlassHandle,
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2c4
V [libjvm.so+0x67b6f9] JavaCalls::call_virtual(JavaValue*, Handle,
KlassHandle, Symbol*, Symbol*, Thread*)+0x59
V [libjvm.so+0x6c09a1] thread_entry(JavaThread*, Thread*)+0xa1
V [libjvm.so+0xa893f2] JavaThread::thread_main_inner()+0x212
V [libjvm.so+0x8c0402] java_start(Thread*)+0xf2
C [libpthread.so.0+0x7e65] start_thread+0xc5{code}
was (Author: JIRAUSER298977):
[~wanghongbing] Is it possible to get the exact line number using the addr2line
command from the thread dump line? (from
https://github.com/facebook/rocksdb/wiki/JNI-Debugging)
{code:java}
addr2line -e java/target/librocksjni-linux64.so <ADDRESS>{code}
Our cluster also encountered similar error, but I was not able to locate the
librocksjni-linux64.so anywhere. Maybe someone has an idea?
{code:java}
Stack: [0x00007fb19c088000,0x00007fb19c189000], sp=0x00007fb19c186978, free
space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [librocksdbjni5488597800197898859.so+0x3b2314]
rocksdb::DBImpl::GetLatestSequenceNumber() const+0x4
j org.rocksdb.RocksDB.getLatestSequenceNumber()J+5
j org.apache.hadoop.hdds.utils.db.RocksDatabase.getLatestSequenceNumber()J+18
j
org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(JJ)Lorg/apache/hadoop/hdds/utils/db/DBUpdatesWrapper;+321
j
org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/om/helpers/DBUpdates;+30
j
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesResponse;+9
J 25168 C2
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse;
(732 bytes) @ 0x00007fc3909159c8 [0x00007fc390911160+0x4868]
J 25148 C2
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse;
(45 bytes) @ 0x00007fc3908b12fc [0x00007fc3908b0a60+0x89c]
J 25166 C2
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB$$Lambda$374.apply(Ljava/lang/Object;)Ljava/lang/Object;
(12 bytes) @ 0x00007fc3908d1a6c [0x00007fc3908d19a0+0xcc]
J 10275 C2
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(Ljava/lang/Object;Lorg/apache/hadoop/hdds/function/FunctionWithServiceException;Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;
(205 bytes) @ 0x00007fc38dc17d88 [0x00007fc38dc17b60+0x228]
J 12527 C2
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Lorg/apache/hadoop/ipc/RPC$Server;Ljava/lang/String;Lorg/apache/hadoop/io/Writable;J)Lorg/apache/hadoop/io/Writable;
(370 bytes) @ 0x00007fc38ecaae68 [0x00007fc38eca9ce0+0x1188]
J 25025 C2 org.apache.hadoop.ipc.Server$RpcCall.run()Ljava/lang/Void; (305
bytes) @ 0x00007fc3907f4264 [0x00007fc3907f4000+0x264]
J 10112 C2 org.apache.hadoop.ipc.Server$RpcCall.run()Ljava/lang/Object; (5
bytes) @ 0x00007fc38ecbc244 [0x00007fc38ecbc220+0x24]
v ~StubRoutines::call_stub
V [libjvm.so+0x67dd1e] JavaCalls::call_helper(JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0xf5e
V [libjvm.so+0x6d0c5d] JVM_DoPrivileged+0x58d
J 154
java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;
(0 bytes) @ 0x00007fc38d41e195 [0x00007fc38d41e0c0+0xd5]
J 10266 C2
org.apache.hadoop.security.UserGroupInformation.doAs(Ljava/security/PrivilegedExceptionAction;)Ljava/lang/Object;
(173 bytes) @ 0x00007fc38ed4fc04 [0x00007fc38ed4f980+0x284]
J 17230% C2 org.apache.hadoop.ipc.Server$Handler.run()V (794 bytes) @
0x00007fc38eef3e1c [0x00007fc38eef3300+0xb1c]
v ~StubRoutines::call_stub
V [libjvm.so+0x67dd1e] JavaCalls::call_helper(JavaValue*, methodHandle*,
JavaCallArguments*, Thread*)+0xf5e
V [libjvm.so+0x67b0e4] JavaCalls::call_virtual(JavaValue*, KlassHandle,
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2c4
V [libjvm.so+0x67b6f9] JavaCalls::call_virtual(JavaValue*, Handle,
KlassHandle, Symbol*, Symbol*, Thread*)+0x59
V [libjvm.so+0x6c09a1] thread_entry(JavaThread*, Thread*)+0xa1
V [libjvm.so+0xa893f2] JavaThread::thread_main_inner()+0x212
V [libjvm.so+0x8c0402] java_start(Thread*)+0xf2
C [libpthread.so.0+0x7e65] start_thread+0xc5{code}
> Ozone Manager crashed caused by rocksdb closed
> ----------------------------------------------
>
> Key: HDDS-8271
> URL: https://issues.apache.org/jira/browse/HDDS-8271
> Project: Apache Ozone
> Issue Type: Bug
> Components: OM
> Affects Versions: 1.3.0
> Reporter: Hongbing Wang
> Assignee: Sadanand Shenoy
> Priority: Critical
>
> An OM crashed, its log has:
> {noformat}
> 2023-03-23 20:32:06,079 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61913753
> to
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61913753-61920134
> 2023-03-23 20:32:06,087 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61920135
> 2023-03-23 20:32:13,820 [om3-client-thread180] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment
> log-61920135_61926451 to index:61926451
> 2023-03-23 20:32:13,821 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61920135
> to
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61920135-61926451
> 2023-03-23 20:32:13,827 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61926452
> 2023-03-23 20:32:15,699 [Finalizer] WARN
> org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils:
> RocksIterator is not closed properly
> 2023-03-23 20:32:15,699 [Finalizer] WARN
> org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils:
> RocksIterator is not closed properly
> 2023-03-23 20:32:21,281 [om3-client-thread183] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment
> log-61926452_61932758 to index:61932758
> 2023-03-23 20:32:21,281 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61926452
> to
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61926452-61932758
> 2023-03-23 20:32:21,288 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61932759
> 2023-03-23 20:32:28,387 [om3-client-thread160] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment
> log-61932759_61939170 to index:61939170
> 2023-03-23 20:32:28,388 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61932759
> to
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61932759-61939170
> 2023-03-23 20:32:28,395 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61939171
> 2023-03-23 20:32:29,003 [OM StateMachine ApplyTransaction Thread - 0] ERROR
> org.apache.hadoop.ozone.om.KeyManagerImpl: CheckAccess operation failed for
> key:vol-XXX/prod-XXX/FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE
> java.io.IOException: Rocks Database is closed
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:395)
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:517)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255)
> at org.apache.hadoop.hdds.utils.db.TypedTable.get(TypedTable.java:181)
> at
> org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1037)
> at
> org.apache.hadoop.ozone.security.acl.OzoneNativeAuthorizer.checkAccess(OzoneNativeAuthorizer.java:164)
> at
> org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2584)
> at
> org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2570)
> at
> org.apache.hadoop.ozone.om.OzoneAclUtils.checkAllAcls(OzoneAclUtils.java:128)
> at
> org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:360)
> at
> org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:215)
> at
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.checkKeyAcls(OMKeyRequest.java:339)
> at
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:138)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:533)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:324)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2023-03-23 20:32:29,004 [OM StateMachine ApplyTransaction Thread - 0] ERROR
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
> MultipartUpload Commit is failed for
> Key:FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE in
> Volume/Bucket vol-XXX/prod-XXX
> INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Check
> access operation failed for
> key:FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE
> at
> org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1083)
> at
> org.apache.hadoop.ozone.security.acl.OzoneNativeAuthorizer.checkAccess(OzoneNativeAuthorizer.java:164)
> at
> org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2584)
> at
> org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2570)
> at
> org.apache.hadoop.ozone.om.OzoneAclUtils.checkAllAcls(OzoneAclUtils.java:128)
> at
> org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:360)
> at
> org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:215)
> at
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.checkKeyAcls(OMKeyRequest.java:339)
> at
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:138)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:533)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:324)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Rocks Database is closed
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:395)
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:517)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255)
> at org.apache.hadoop.hdds.utils.db.TypedTable.get(TypedTable.java:181)
> at
> org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1037)
> ... 15 more
> 2023-03-23 20:32:29,012 [OMDoubleBufferFlushThread] ERROR
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Terminating with
> exit status 1: During flush to DB encountered error in OMDoubleBuffer flush
> thread OMDoubleBufferFlushThread
> java.io.IOException: Rocks Database is closed
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase$ColumnFamily.assertClosed(RocksDatabase.java:310)
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase$ColumnFamily.batchPut(RocksDatabase.java:293)
> at
> org.apache.hadoop.hdds.utils.db.RDBBatchOperation.put(RDBBatchOperation.java:62)
> at
> org.apache.hadoop.hdds.utils.db.RDBTable.putWithBatch(RDBTable.java:77)
> at
> org.apache.hadoop.hdds.utils.db.RDBTable.putWithBatch(RDBTable.java:40)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:132)
> at
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:149)
> at
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.checkAndUpdateDB(S3MultipartUploadCommitPartResponse.java:117)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$2(OzoneManagerDoubleBuffer.java:283)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatchWithTrace(OzoneManagerDoubleBuffer.java:228)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$1(OzoneManagerDoubleBuffer.java:281)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:277)
> at java.lang.Thread.run(Thread.java:745)
> 2023-03-23 20:32:29,016 [shutdown-hook-0] INFO
> org.apache.hadoop.ozone.om.OzoneManagerStarter: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down OzoneManager at bigdata-hostxxx.com/10.xx.xx.xx
> ************************************************************/
> {noformat}
> more OM log see: [OM crashed detail log
> |https://github.com/whbing/issue_logs/blob/main/ozone/om/om_crash.20230323.log]
> rocksdb log: LOG.old.1679574802031612
> {noformat}
> 2023/03/23-20:32:26.278546 140423117969152 [/db_impl/db_impl_files.cc:445]
> [JOB 4050] Try to delete WAL files size 42545044203, prev total WAL file size
> 42958842785, number of live WAL files 326.
> ...
> 2023/03/23-20:32:28.853876 140423224928000 [/version_set.cc:3800] More
> existing levels in DB than needed. max_bytes_for_level_multiplier may not be
> guaranteed.
> ...
> 2023/03/23-20:32:28.854153 140423224928000 (Original Log Time
> 2023/03/23-20:32:28.854112) [/db_impl/db_impl_compaction_flush.cc:284]
> [multipartInfoTable] Level summary: base level 6 level multiplier 10.00 max
> bytes base 268435456 files[2 0 0 0 0 0 3] max score 0.50
> ...
> 2023/03/23-20:32:28.998765 140423117969152 [le/delete_scheduler.cc:77]
> Deleted file /data/ozonedata/om/db/om.db/098324.log immediately,
> rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
> 2023/03/23-20:32:29.009957 140422757758720 [/db_impl/db_impl.cc:480]
> Shutdown: canceling all background work
> 2023/03/23-20:32:29.020213 140423117969152 [le/delete_scheduler.cc:77]
> Deleted file /data/ozonedata/om/db/om.db/098322.log immediately,
> rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
> ...
> 2023/03/23-20:32:32.829353 140423117969152 [le/delete_scheduler.cc:77]
> Deleted file /data/ozonedata/om/db/om.db/097921.log immediately,
> rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
> 2023/03/23-20:32:32.883735 140422757758720 [/db_impl/db_impl.cc:710] Shutdown
> complete
> {noformat}
> more see : [detail rocks
> log|https://github.com/whbing/issue_logs/blob/main/ozone/om/LOG.old.1679574802031612]
> core file: hs_err_pid1017701.log :
> {noformat}
> Stack: [0x00007fb6b8883000,0x00007fb6b8984000], sp=0x00007fb6b89817f8, free
> space=1017k
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j org.rocksdb.RocksDB.getLatestSequenceNumber(J)J+0
> j org.rocksdb.RocksDB.getLatestSequenceNumber()J+5
> j org.apache.hadoop.hdds.utils.db.RocksDatabase.getLatestSequenceNumber()J+18
> j
> org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(JJ)Lorg/apache/hadoop/hdds/utils/db/DBUpdatesWrapper;+395
> j
> org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/om/helpers/DBUpdates;+30
> j
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/p
> rotocol/proto/OzoneManagerProtocolProtos$DBUpdatesResponse;+9
> J 19759 C2
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMRequest;)Lorg/apache/hadoop/ozo
> ne/protocol/proto/OzoneManagerProtocolProtos$OMResponse; (840 bytes) @
> 0x00007fc99793523c [0x00007fc997933ec0+0x137c]
> J 19799 C2
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerP
> rotocolProtos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse;
> (81 bytes) @ 0x00007fc997dec564 [0x00007fc997deb460+0x1104]
> J 17937 C2
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(Lcom/google/protobuf/Descriptors$MethodDescriptor;Lcom/google/protobuf/Rp
> cController;Lcom/google/protobuf/Message;)Lcom/google/protobuf/Message; (69
> bytes) @ 0x00007fc995ec62a0 [0x00007fc995ec6140+0x160]
> J 17503 C2
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Lorg/apache/hadoop/ipc/RPC$Server;Ljava/lang/String;Lorg/apache/hadoop/io/Writable;J)Lorg/apache/hadoop/io/Wr
> itable; (830 bytes) @ 0x00007fc99714df78 [0x00007fc99714c9c0+0x15b8]
> J 17497 C2 org.apache.hadoop.ipc.Server$RpcCall.run()Ljava/lang/Object; (5
> bytes) @ 0x00007fc997237f70 [0x00007fc997237c60+0x310]
> v ~StubRoutines::call_stub
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]