[
https://issues.apache.org/jira/browse/HDDS-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Janus Chow updated HDDS-8271:
-----------------------------
Target Version/s: 1.5.0 (was: 1.4.0)
I am managing the 1.4.0 release and we currently have more than 500 issues
targeted for 1.4.0. I am moving the target field to 1.5.0.
If you are actively working on this jira and believe this should be targeted
for the 1.4.0 release, Please reach out to me via Apache email or Slack.
> Ozone Manager crashed caused by rocksdb closed
> ----------------------------------------------
>
> Key: HDDS-8271
> URL: https://issues.apache.org/jira/browse/HDDS-8271
> Project: Apache Ozone
> Issue Type: Bug
> Components: OM
> Affects Versions: 1.3.0
> Reporter: Hongbing Wang
> Assignee: Sadanand Shenoy
> Priority: Critical
>
> An OM crashed, its log has:
> {noformat}
> 2023-03-23 20:32:06,079 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61913753
> to
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61913753-61920134
> 2023-03-23 20:32:06,087 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61920135
> 2023-03-23 20:32:13,820 [om3-client-thread180] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment
> log-61920135_61926451 to index:61926451
> 2023-03-23 20:32:13,821 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61920135
> to
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61920135-61926451
> 2023-03-23 20:32:13,827 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61926452
> 2023-03-23 20:32:15,699 [Finalizer] WARN
> org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils:
> RocksIterator is not closed properly
> 2023-03-23 20:32:15,699 [Finalizer] WARN
> org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils:
> RocksIterator is not closed properly
> 2023-03-23 20:32:21,281 [om3-client-thread183] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment
> log-61926452_61932758 to index:61932758
> 2023-03-23 20:32:21,281 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61926452
> to
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61926452-61932758
> 2023-03-23 20:32:21,288 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61932759
> 2023-03-23 20:32:28,387 [om3-client-thread160] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment
> log-61932759_61939170 to index:61939170
> 2023-03-23 20:32:28,388 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61932759
> to
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61932759-61939170
> 2023-03-23 20:32:28,395 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment
> /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61939171
> 2023-03-23 20:32:29,003 [OM StateMachine ApplyTransaction Thread - 0] ERROR
> org.apache.hadoop.ozone.om.KeyManagerImpl: CheckAccess operation failed for
> key:vol-XXX/prod-XXX/FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE
> java.io.IOException: Rocks Database is closed
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:395)
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:517)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255)
> at org.apache.hadoop.hdds.utils.db.TypedTable.get(TypedTable.java:181)
> at
> org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1037)
> at
> org.apache.hadoop.ozone.security.acl.OzoneNativeAuthorizer.checkAccess(OzoneNativeAuthorizer.java:164)
> at
> org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2584)
> at
> org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2570)
> at
> org.apache.hadoop.ozone.om.OzoneAclUtils.checkAllAcls(OzoneAclUtils.java:128)
> at
> org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:360)
> at
> org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:215)
> at
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.checkKeyAcls(OMKeyRequest.java:339)
> at
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:138)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:533)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:324)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2023-03-23 20:32:29,004 [OM StateMachine ApplyTransaction Thread - 0] ERROR
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
> MultipartUpload Commit is failed for
> Key:FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE in
> Volume/Bucket vol-XXX/prod-XXX
> INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Check
> access operation failed for
> key:FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE
> at
> org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1083)
> at
> org.apache.hadoop.ozone.security.acl.OzoneNativeAuthorizer.checkAccess(OzoneNativeAuthorizer.java:164)
> at
> org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2584)
> at
> org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2570)
> at
> org.apache.hadoop.ozone.om.OzoneAclUtils.checkAllAcls(OzoneAclUtils.java:128)
> at
> org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:360)
> at
> org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:215)
> at
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.checkKeyAcls(OMKeyRequest.java:339)
> at
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:138)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:533)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:324)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Rocks Database is closed
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:395)
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:517)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110)
> at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255)
> at org.apache.hadoop.hdds.utils.db.TypedTable.get(TypedTable.java:181)
> at
> org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1037)
> ... 15 more
> 2023-03-23 20:32:29,012 [OMDoubleBufferFlushThread] ERROR
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Terminating with
> exit status 1: During flush to DB encountered error in OMDoubleBuffer flush
> thread OMDoubleBufferFlushThread
> java.io.IOException: Rocks Database is closed
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase$ColumnFamily.assertClosed(RocksDatabase.java:310)
> at
> org.apache.hadoop.hdds.utils.db.RocksDatabase$ColumnFamily.batchPut(RocksDatabase.java:293)
> at
> org.apache.hadoop.hdds.utils.db.RDBBatchOperation.put(RDBBatchOperation.java:62)
> at
> org.apache.hadoop.hdds.utils.db.RDBTable.putWithBatch(RDBTable.java:77)
> at
> org.apache.hadoop.hdds.utils.db.RDBTable.putWithBatch(RDBTable.java:40)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:132)
> at
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:149)
> at
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.checkAndUpdateDB(S3MultipartUploadCommitPartResponse.java:117)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$2(OzoneManagerDoubleBuffer.java:283)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatchWithTrace(OzoneManagerDoubleBuffer.java:228)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$1(OzoneManagerDoubleBuffer.java:281)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:277)
> at java.lang.Thread.run(Thread.java:745)
> 2023-03-23 20:32:29,016 [shutdown-hook-0] INFO
> org.apache.hadoop.ozone.om.OzoneManagerStarter: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down OzoneManager at bigdata-hostxxx.com/10.xx.xx.xx
> ************************************************************/
> {noformat}
> more OM log see: [OM crashed detail log
> |https://github.com/whbing/issue_logs/blob/main/ozone/om/om_crash.20230323.log]
> rocksdb log: LOG.old.1679574802031612
> {noformat}
> 2023/03/23-20:32:26.278546 140423117969152 [/db_impl/db_impl_files.cc:445]
> [JOB 4050] Try to delete WAL files size 42545044203, prev total WAL file size
> 42958842785, number of live WAL files 326.
> ...
> 2023/03/23-20:32:28.853876 140423224928000 [/version_set.cc:3800] More
> existing levels in DB than needed. max_bytes_for_level_multiplier may not be
> guaranteed.
> ...
> 2023/03/23-20:32:28.854153 140423224928000 (Original Log Time
> 2023/03/23-20:32:28.854112) [/db_impl/db_impl_compaction_flush.cc:284]
> [multipartInfoTable] Level summary: base level 6 level multiplier 10.00 max
> bytes base 268435456 files[2 0 0 0 0 0 3] max score 0.50
> ...
> 2023/03/23-20:32:28.998765 140423117969152 [le/delete_scheduler.cc:77]
> Deleted file /data/ozonedata/om/db/om.db/098324.log immediately,
> rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
> 2023/03/23-20:32:29.009957 140422757758720 [/db_impl/db_impl.cc:480]
> Shutdown: canceling all background work
> 2023/03/23-20:32:29.020213 140423117969152 [le/delete_scheduler.cc:77]
> Deleted file /data/ozonedata/om/db/om.db/098322.log immediately,
> rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
> ...
> 2023/03/23-20:32:32.829353 140423117969152 [le/delete_scheduler.cc:77]
> Deleted file /data/ozonedata/om/db/om.db/097921.log immediately,
> rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
> 2023/03/23-20:32:32.883735 140422757758720 [/db_impl/db_impl.cc:710] Shutdown
> complete
> {noformat}
> more see : [detail rocks
> log|https://github.com/whbing/issue_logs/blob/main/ozone/om/LOG.old.1679574802031612]
> core file: hs_err_pid1017701.log :
> {noformat}
> Stack: [0x00007fb6b8883000,0x00007fb6b8984000], sp=0x00007fb6b89817f8, free
> space=1017k
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j org.rocksdb.RocksDB.getLatestSequenceNumber(J)J+0
> j org.rocksdb.RocksDB.getLatestSequenceNumber()J+5
> j org.apache.hadoop.hdds.utils.db.RocksDatabase.getLatestSequenceNumber()J+18
> j
> org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(JJ)Lorg/apache/hadoop/hdds/utils/db/DBUpdatesWrapper;+395
> j
> org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/om/helpers/DBUpdates;+30
> j
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/p
> rotocol/proto/OzoneManagerProtocolProtos$DBUpdatesResponse;+9
> J 19759 C2
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMRequest;)Lorg/apache/hadoop/ozo
> ne/protocol/proto/OzoneManagerProtocolProtos$OMResponse; (840 bytes) @
> 0x00007fc99793523c [0x00007fc997933ec0+0x137c]
> J 19799 C2
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerP
> rotocolProtos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse;
> (81 bytes) @ 0x00007fc997dec564 [0x00007fc997deb460+0x1104]
> J 17937 C2
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(Lcom/google/protobuf/Descriptors$MethodDescriptor;Lcom/google/protobuf/Rp
> cController;Lcom/google/protobuf/Message;)Lcom/google/protobuf/Message; (69
> bytes) @ 0x00007fc995ec62a0 [0x00007fc995ec6140+0x160]
> J 17503 C2
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Lorg/apache/hadoop/ipc/RPC$Server;Ljava/lang/String;Lorg/apache/hadoop/io/Writable;J)Lorg/apache/hadoop/io/Wr
> itable; (830 bytes) @ 0x00007fc99714df78 [0x00007fc99714c9c0+0x15b8]
> J 17497 C2 org.apache.hadoop.ipc.Server$RpcCall.run()Ljava/lang/Object; (5
> bytes) @ 0x00007fc997237f70 [0x00007fc997237c60+0x310]
> v ~StubRoutines::call_stub
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]