Pratyush Bhatt created HDDS-11550:
-------------------------------------

             Summary: [HBase Ozone] Leaserecovery fails with 
"StorageContainerException: Unable to find the block"
                 Key: HDDS-11550
                 URL: https://issues.apache.org/jira/browse/HDDS-11550
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Pratyush Bhatt


*Test Scenario:*
1. Open multiple files in Ozone ~100
2. Write some data and perform hsync.
3. Perform lease recovery on all files.
4. Parallely restart OM leader.

*Observed behavior:*

Lease recovery command failed for one of the file
{code:java}
2024-10-07 10:32:33,947|INFO|Thread-102|machine.py:190 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|RUNNING: 
/opt/cloudera/parcels/CDH/bin/ozone debug recover 
--path=ofs://ozone1728283844/hsyncvol/hsyncbuck/hsync/File_17.txt
2024-10-07 10:32:34,544|INFO|Thread-100|machine.py:205 - 
run()||GUID=e2ec2f41-91d2-4cb5-b67b-5dee26443a57|2024-10-07 10:32:34:540 1120 
sec: 365401 operations; 808.5 current ops/sec; est completion in 1 hour 23 
minutes [INSERT: Count=8087, Max=131583, Min=3264, Avg=12369.62, 90=29215, 
99=65663, 99.9=103359, 99.99=129919]
2024-10-07 10:32:38,670|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|24/10/07 10:32:38 WARN 
impl.MetricsConfig: Cannot locate configuration: tried 
hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties
2024-10-07 10:32:38,758|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|24/10/07 10:32:38 INFO 
impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2024-10-07 10:32:38,758|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|24/10/07 10:32:38 INFO 
impl.MetricsSystemImpl: XceiverClientMetrics metrics system started
2024-10-07 10:32:38,843|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|24/10/07 10:32:38 INFO 
scm.XceiverClientRatis: WatchType ALL_COMMITTED. Majority 2,
2024-10-07 10:32:38,926|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|24/10/07 10:32:38 INFO 
netty.NettyConfigKeys$DataStream: setTlsConf GrpcTlsConfig0-
2024-10-07 10:32:39,413|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|24/10/07 10:32:39 INFO 
metrics.MetricRegistries: Loaded MetricRegistries class 
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl
2024-10-07 10:32:40,686|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|24/10/07 10:32:40 ERROR 
impl.OrderedAsync: Failed to send request, message=cmdType: FinalizeBlock
2024-10-07 10:32:40,686|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|traceID: ""
2024-10-07 10:32:40,686|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|containerID: 1051
2024-10-07 10:32:40,686|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|datanodeUuid: 
"9d980530-9455-4855-b264-0b011f810aeb"
2024-10-07 10:32:40,687|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|encodedToken: 
"UwoCb20SJWNvbklEOiAxMDUxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MDQzNjAYw8bA3aYyKAEoAjCAgICAAToVCOOPkafmyr6oZRC9mtq4-IbouI8BIG9ivUYkaRtyNJ64ih8ZyvB3OOVC6EpiW_VZeI7CmrioEEhERFNfQkxPQ0tfVE9LRU4oY29uSUQ6IDEwNTEgbG9jSUQ6IDExMzc1MDE1MzYyNTYwNDM2MAAAAA"
2024-10-07 10:32:40,687|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|version: 3
2024-10-07 10:32:40,687|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|finalizeBlock {
2024-10-07 10:32:40,687|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|blockID {
2024-10-07 10:32:40,687|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|containerID: 1051
2024-10-07 10:32:40,688|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|localID: 113750153625604360
2024-10-07 10:32:40,688|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|blockCommitSequenceId: 0
2024-10-07 10:32:40,688|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|}
2024-10-07 10:32:40,688|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|}
2024-10-07 10:32:40,688|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|, data.size=0
2024-10-07 10:32:40,688|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|java.util.concurrent.CompletionException:
 org.apache.ratis.protocol.exceptions.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Unable to find the block. BlockID : conID: 1051 locID: 113750153625604360 
bcsId: 0 replicaIndex: null
2024-10-07 10:32:40,689|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.client.impl.RaftClientImpl.handleRaftException(RaftClientImpl.java:373)
2024-10-07 10:32:40,689|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.client.impl.OrderedAsync.lambda$send$3(OrderedAsync.java:175)
2024-10-07 10:32:40,689|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
2024-10-07 10:32:40,689|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
2024-10-07 10:32:40,689|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2024-10-07 10:32:40,690|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2024-10-07 10:32:40,690|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:105)
2024-10-07 10:32:40,690|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:66)
2024-10-07 10:32:40,690|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:147)
2024-10-07 10:32:40,690|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:351)
2024-10-07 10:32:40,690|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequestWithRetry$5(OrderedAsync.java:210)
2024-10-07 10:32:40,691|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)
2024-10-07 10:32:40,691|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646)
2024-10-07 10:32:40,691|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2024-10-07 10:32:40,691|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2024-10-07 10:32:40,691|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:322)
2024-10-07 10:32:40,691|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.Optional.ifPresent(Optional.java:159)
2024-10-07 10:32:40,692|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:378)
2024-10-07 10:32:40,692|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:300)
2024-10-07 10:32:40,692|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:322)
2024-10-07 10:32:40,692|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:305)
2024-10-07 10:32:40,692|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:468)
2024-10-07 10:32:40,692|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
2024-10-07 10:32:40,693|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.thirdparty.io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:473)
2024-10-07 10:32:40,693|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:660)
2024-10-07 10:32:40,693|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:647)
2024-10-07 10:32:40,693|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
2024-10-07 10:32:40,693|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
2024-10-07 10:32:40,694|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2024-10-07 10:32:40,694|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2024-10-07 10:32:40,694|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.lang.Thread.run(Thread.java:750)
2024-10-07 10:32:40,694|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|Caused by: 
org.apache.ratis.protocol.exceptions.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Unable to find the block. BlockID : conID: 1051 locID: 113750153625604360 
bcsId: 0 replicaIndex: null
2024-10-07 10:32:40,694|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$12(ContainerStateMachine.java:1070)
2024-10-07 10:32:40,695|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
2024-10-07 10:32:40,695|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
2024-10-07 10:32:40,695|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2024-10-07 10:32:40,695|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2024-10-07 10:32:40,695|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.util.TaskQueue.lambda$submit$0(TaskQueue.java:133)
2024-10-07 10:32:40,696|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
2024-10-07 10:32:40,696|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.util.LogUtils$1.run(LogUtils.java:78)
2024-10-07 10:32:40,696|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2024-10-07 10:32:40,696|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
2024-10-07 10:32:40,696|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|... 3 more
2024-10-07 10:32:40,696|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|24/10/07 10:32:40 WARN 
ozone.BasicRootedOzoneClientAdapterImpl: Failed to execute finalizeBlock command
2024-10-07 10:32:40,696|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|java.io.IOException: Failed to 
execute command cmdType: FinalizeBlock
2024-10-07 10:32:40,696|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|containerID: 1051
2024-10-07 10:32:40,696|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|datanodeUuid: 
"9d980530-9455-4855-b264-0b011f810aeb"
2024-10-07 10:32:40,696|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|encodedToken: 
"UwoCb20SJWNvbklEOiAxMDUxIGxvY0lEOiAxMTM3NTAxNTM2MjU2MDQzNjAYw8bA3aYyKAEoAjCAgICAAToVCOOPkafmyr6oZRC9mtq4-IbouI8BIG9ivUYkaRtyNJ64ih8ZyvB3OOVC6EpiW_VZeI7CmrioEEhERFNfQkxPQ0tfVE9LRU4oY29uSUQ6IDEwNTEgbG9jSUQ6IDExMzc1MDE1MzYyNTYwNDM2MAAAAA"
2024-10-07 10:32:40,697|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|finalizeBlock {
2024-10-07 10:32:40,697|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|blockID {
2024-10-07 10:32:40,697|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|containerID: 1051
2024-10-07 10:32:40,697|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|localID: 113750153625604360
2024-10-07 10:32:40,697|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|blockCommitSequenceId: 0
2024-10-07 10:32:40,697|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|}
2024-10-07 10:32:40,698|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|}
2024-10-07 10:32:40,698|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|
2024-10-07 10:32:40,698|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.hdds.scm.XceiverClientSpi.getIOExceptionForSendCommand(XceiverClientSpi.java:156)
2024-10-07 10:32:40,698|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.hdds.scm.XceiverClientSpi.sendCommand(XceiverClientSpi.java:149)
2024-10-07 10:32:40,698|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.finalizeBlock(ContainerProtocolCalls.java:326)
2024-10-07 10:32:40,698|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.finalizeBlock(BasicRootedOzoneClientAdapterImpl.java:1433)
2024-10-07 10:32:40,699|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.fs.ozone.LeaseRecoveryClientDNHandler.getOmKeyLocationInfos(LeaseRecoveryClientDNHandler.java:83)
2024-10-07 10:32:40,699|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.recoverLeaseTraced(RootedOzoneFileSystem.java:162)
2024-10-07 10:32:40,699|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.lambda$recoverLease$0(RootedOzoneFileSystem.java:141)
2024-10-07 10:32:40,699|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
2024-10-07 10:32:40,700|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:159)
2024-10-07 10:32:40,700|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.recoverLease(RootedOzoneFileSystem.java:140)
2024-10-07 10:32:40,700|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.ozone.debug.LeaseRecoverer.call(LeaseRecoverer.java:77)
2024-10-07 10:32:40,700|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.ozone.debug.LeaseRecoverer.call(LeaseRecoverer.java:38)
2024-10-07 10:32:40,701|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
picocli.CommandLine.executeUserObject(CommandLine.java:2045)
2024-10-07 10:32:40,701|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
picocli.CommandLine.access$1500(CommandLine.java:148)
2024-10-07 10:32:40,701|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
2024-10-07 10:32:40,701|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
2024-10-07 10:32:40,701|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
2024-10-07 10:32:40,701|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
2024-10-07 10:32:40,701|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
2024-10-07 10:32:40,702|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
picocli.CommandLine.execute(CommandLine.java:2174)
2024-10-07 10:32:40,702|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
2024-10-07 10:32:40,702|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
2024-10-07 10:32:40,702|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.ozone.debug.OzoneDebug.main(OzoneDebug.java:64)
2024-10-07 10:32:40,702|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|Caused by: 
java.util.concurrent.ExecutionException: 
org.apache.ratis.protocol.exceptions.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Unable to find the block. BlockID : conID: 1051 locID: 113750153625604360 
bcsId: 0 replicaIndex: null
2024-10-07 10:32:40,702|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
2024-10-07 10:32:40,702|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
2024-10-07 10:32:40,702|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.hdds.scm.XceiverClientSpi.sendCommand(XceiverClientSpi.java:139)
2024-10-07 10:32:40,703|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|... 21 more
2024-10-07 10:32:40,703|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|Caused by: 
org.apache.ratis.protocol.exceptions.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Unable to find the block. BlockID : conID: 1051 locID: 113750153625604360 
bcsId: 0 replicaIndex: null
2024-10-07 10:32:40,703|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$12(ContainerStateMachine.java:1070)
2024-10-07 10:32:40,703|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
2024-10-07 10:32:40,703|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
2024-10-07 10:32:40,703|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
2024-10-07 10:32:40,703|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
2024-10-07 10:32:40,703|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.util.TaskQueue.lambda$submit$0(TaskQueue.java:133)
2024-10-07 10:32:40,703|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
2024-10-07 10:32:40,704|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
org.apache.ratis.util.LogUtils$1.run(LogUtils.java:78)
2024-10-07 10:32:40,704|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2024-10-07 10:32:40,704|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
2024-10-07 10:32:40,704|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2024-10-07 10:32:40,704|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2024-10-07 10:32:40,704|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|at 
java.lang.Thread.run(Thread.java:750)
2024-10-07 10:32:40,872|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|24/10/07 10:32:40 WARN 
scm.XceiverClientGrpc: Failed to execute command GetCommittedBlockLength on the 
pipeline Pipeline[ Id: 53abcc7c-979c-4097-9687-1f8dbb627ccd, Nodes: 
9d980530-9455-4855-b264-0b011f810aeb(DN-4/11.22.33.44) ReplicaIndex: 
06974e9b8-eeb9-4047-b0a2-626b2c175eb1(DN-7/22.11.33.44) ReplicaIndex: 
018b2d32b-faa9-4c6d-9730-8f808679f70f(DN-6/33.11.22.44) ReplicaIndex: 0, 
ReplicationConfig: STANDALONE/THREE, State:OPEN, leaderId:, 
CreationTimestamp2024-10-07T10:32:40.692Z[Etc/UTC]].
2024-10-07 10:32:40,872|INFO|Thread-102|machine.py:205 - 
run()||GUID=0fac0928-4730-408c-9f03-4eeedad7b751|Unable to find the block. 
BlockID : conID: 1051 locID: 113750153625604360 bcsId: 0 replicaIndex: null 
{code}
Checked the SCM leader log(Node 2) for this container ID:
{code:java}
2024-10-07 10:32:40,494 INFO 
[node2-EventQueue-CloseContainerForCloseContainerEventHandler]-org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler:
 Close container Event triggered for container : #1051, current state: OPEN
2024-10-07 10:32:40,494 INFO 
[node2-FixedThreadPoolWithAffinityExecutor-9-0]-org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler:
 Moving OPEN container #1051 to CLOSING state, datanode 
9d980530-9455-4855-b264-0b011f810aeb(DN-4/11.22.33.44) reported UNHEALTHY 
replica with index 0.
2024-10-07 10:32:40,496 INFO 
[node2-EventQueue-CloseContainerForCloseContainerEventHandler]-org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler:
 Close container Event triggered for container : #1051, current state: CLOSING
2024-10-07 10:32:40,497 INFO 
[node2-EventQueue-CloseContainerForCloseContainerEventHandler]-org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler:
 Close container Event triggered for container : #1051, current state: CLOSING
2024-10-07 10:32:40,515 INFO 
[node2-EventQueue-PipelineActionsForPipelineActionHandler]-org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler:
 Received pipeline action CLOSE for 
PipelineID=e34300bb-044e-4641-b338-072839835604 from datanode 
18b2d32b-faa9-4c6d-9730-8f808679f70f. Reason : Ratis Transaction failure in 
datanode 18b2d32b-faa9-4c6d-9730-8f808679f70f with role LEADER .Triggering 
pipeline close action. {code}
Checked the DN-4 (node 4), it shows:
{code:java}
2024-10-07 10:32:40,483 WARN 
[9d980530-9455-4855-b264-0b011f810aeb-ContainerOp-e34300bb-044e-4641-b338-072839835604-7]-org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler:
 Operation: FinalizeBlock , Trace ID:  , Message: Unable to find the block. 
BlockID : conID: 1051 locID: 113750153625604360 bcsId: 0 replicaIndex: null , 
Result: NO_SUCH_BLOCK , StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Unable to find the block. BlockID : conID: 1051 locID: 113750153625604360 
bcsId: 0 replicaIndex: null
    at 
org.apache.hadoop.ozone.container.metadata.DatanodeStoreWithIncrementalChunkList.getCompleteBlockData(DatanodeStoreWithIncrementalChunkList.java:69)
    at 
org.apache.hadoop.ozone.container.metadata.DatanodeStore.getBlockByID(DatanodeStore.java:138)
    at 
org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.getBlockByID(BlockManagerImpl.java:375)
    at 
org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.getBlock(BlockManagerImpl.java:277)
    at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleFinalizeBlock(KeyValueHandler.java:611)
    at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:287)
    at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:231)
    at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:355)
    at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:195)
    at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
    at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:194)
    at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:505)
    at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$10(ContainerStateMachine.java:973)
    at org.apache.ratis.util.TaskQueue.lambda$submit$0(TaskQueue.java:121)
    at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
    at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:78)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750) {code}
Note: Lease recovery on other files succeeded.

cc: [~weichiu] [~ashishkr] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to