Attila Doroszlai created HDDS-8539:
--------------------------------------

             Summary: Container DB open, but not found in DatanodeStoreCache
                 Key: HDDS-8539
                 URL: https://issues.apache.org/jira/browse/HDDS-8539
             Project: Apache Ozone
          Issue Type: Bug
          Components: Ozone Datanode
            Reporter: Attila Doroszlai


Surefire fork Intermittently timeouts in {{TestDecommissionAndMaintenance}}.

Container DB added to cache:

{code}
2023-05-03 08:18:26,909 [EndpointStateMachine task thread for /0.0.0.0:43723 - 
0 ] INFO  utils.DatanodeStoreCache (DatanodeStoreCache.java:addDB(58)) - Added 
db 
/home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db
 to cache
{code}

but then not found and tried to open again:

{code}
2023-05-03 08:18:57,086 [Command processor thread] ERROR 
utils.DatanodeStoreCache (DatanodeStoreCache.java:getDB(74)) - Failed to get DB 
store 
/home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db
java.io.IOException: Failed init RocksDB, db path : 
/home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db,
 exception :org.rocksdb.RocksDBException lock hold by current process, acquire 
time 1683101936 acquiring thread 139985634854656: 
/home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db/LOCK:
 No locks available
        at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:182)
        at 
org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:212)
        at 
org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:147)
        at 
org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:99)
        at 
org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaThreeImpl.<init>(DatanodeStoreSchemaThreeImpl.java:66)
        at 
org.apache.hadoop.ozone.container.common.utils.DatanodeStoreCache.getDB(DatanodeStoreCache.java:69)
        at 
org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getDB(BlockUtils.java:132)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.flushAndSyncDB(KeyValueContainer.java:444)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.closeAndFlushIfNeeded(KeyValueContainer.java:385)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.quasiClose(KeyValueContainer.java:355)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.quasiCloseContainer(KeyValueHandler.java:1121)
        at 
org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.quasiCloseContainer(ContainerController.java:142)
        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.notifyGroupRemove(ContainerStateMachine.java:1052)
        at 
org.apache.ratis.server.impl.RaftServerImpl.groupRemove(RaftServerImpl.java:423)
        at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$groupRemoveAsync$12(RaftServerProxy.java:530)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
        at 
java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
        at 
java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
        at 
org.apache.ratis.server.impl.RaftServerProxy.groupRemoveAsync(RaftServerProxy.java:529)
        at 
org.apache.ratis.server.impl.RaftServerProxy.groupManagementAsync(RaftServerProxy.java:479)
        at 
org.apache.ratis.server.impl.RaftServerProxy.groupManagement(RaftServerProxy.java:459)
        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.removeGroup(XceiverServerRatis.java:822)
        at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler.handle(ClosePipelineCommandHandler.java:77)
        at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:99)
        at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$3(DatanodeStateMachine.java:644)
        at java.lang.Thread.run(Thread.java:750)
{code}

This continues until the fork is killed:

{code}
2023-05-03 08:33:24,505 [Command processor thread] ERROR 
utils.DatanodeStoreCache (DatanodeStoreCache.java:getDB(74)) - Failed to get DB 
store 
/home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db
java.io.IOException: Failed init RocksDB, db path : 
/home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db,
 exception :org.rocksdb.RocksDBException lock hold by current process, acquire 
time 1683101936 acquiring thread 139985634854656: 
/home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db/LOCK:
 No locks available
        at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:182)
        at 
org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:212)
        at 
org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:147)
        at 
org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:99)
        at 
org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaThreeImpl.<init>(DatanodeStoreSchemaThreeImpl.java:66)
        at 
org.apache.hadoop.ozone.container.common.utils.DatanodeStoreCache.getDB(DatanodeStoreCache.java:69)
        at 
org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getDB(BlockUtils.java:132)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.flushAndSyncDB(KeyValueContainer.java:444)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.closeAndFlushIfNeeded(KeyValueContainer.java:385)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.quasiClose(KeyValueContainer.java:355)
{code}

* 
https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/21/21757/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
* 
https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/24/21800/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
* 
https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/24/21805/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
* 
https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/27/21885/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
* 
https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/27/21895/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
* 
https://github.com/adoroszlai/ozone-build-results/blob/master/2023/04/28/21927/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
* 
https://github.com/adoroszlai/ozone-build-results/blob/master/2023/05/03/21994/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt
* 
https://github.com/adoroszlai/ozone-build-results/blob/master/2023/05/03/21995/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance-output.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to