Varsha Ravi created HDDS-9319:
---------------------------------
Summary: EC Reconstruction fails when chunk length is 0 bytes
Key: HDDS-9319
URL: https://issues.apache.org/jira/browse/HDDS-9319
Project: Apache Ozone
Issue Type: Bug
Components: EC
Reporter: Varsha Ravi
EC Offline reconstruction is failing with *java.io.IOException: Failed to get
chunkInfo[123]* exception. The DN log shows there are insufficient datanodes to
read the EC blocks even though there are 3 DNs up out of 5.
This could be because the chunk length is zero bytes.
*EC Policy:* rs-3-2-1024k
*DN.log:*
{noformat}
2023-09-15 13:47:57,844 ERROR
[ec-reconstruct-reader-TID-0]-org.apache.hadoop.hdds.scm.XceiverClientGrpc:
Failed to execute command GetBlock on the pipeline Pipeline[ Id:
b3a4dcf4-916d-4a97-adde-daaf9785b237, Nodes:
b3a4dcf4-916d-4a97-adde-daaf9785b237, ReplicationConfig: STANDALONE/ONE,
State:CLOSED, leaderId:, CreationTimestamp2023-09-15T13:47:57.796097Z[UTC]].
2023-09-15 13:47:57,845 INFO
[ContainerReplicationThread-0]-org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream:
ECBlockReconstructedStripeInputStream{conID: 1012 locID:
111677748019201071}@740ac7aa: error reading [2], marked as failed
org.apache.hadoop.ozone.client.io.BadDataLocationException:
java.io.IOException: Failed to get chunkInfo[123]: len == 0
at
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:633)
at
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.lambda$loadDataBuffersFromStream$2(ECBlockReconstructedStripeInputStream.java:566)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Failed to get chunkInfo[123]: len == 0
at
org.apache.hadoop.hdds.scm.storage.BlockInputStream.validate(BlockInputStream.java:278)
at
org.apache.hadoop.hdds.scm.storage.BlockInputStream.lambda$static$0(BlockInputStream.java:265)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:407)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:347)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:342)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:323)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:208)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$getBlock$0(ContainerProtocolCalls.java:186)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:146)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:185)
at
org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:255)
at
org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:146)
at
org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:308)
at
org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:66)
at
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readFromCurrentLocation(ECBlockReconstructedStripeInputStream.java:644)
at
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:620)
... 5 more
2023-09-15 13:47:57,864 WARN
[ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
Exception while reconstructing the container 1012. Cleaning up all the
recovering containers in the reconstruction process.
org.apache.hadoop.ozone.client.io.InsufficientLocationsException: There are
insufficient datanodes to read the EC block
at
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.init(ECBlockReconstructedStripeInputStream.java:224)
at
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.read(ECBlockReconstructedStripeInputStream.java:382)
at
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.recoverChunks(ECBlockReconstructedStripeInputStream.java:331)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:288)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:170)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
at
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2023-09-15 13:47:57,916 INFO
[ChunkReader-5]-org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer:
Moving container
/hadoop-ozone/datanode/data/hdds/CID-2e752843-8947-454d-992f-013b3824468b/current/containerDir1/1012
to state DELETED from state:RECOVERING
2023-09-15 13:47:57,920 WARN
[ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
FAILED reconstructECContainersCommand: containerID=1012,
replication=rs-3-2-1024k, missingIndexes=[1, 2],
sources={3=b3a4dcf4-916d-4a97-adde-daaf9785b237,
4=4641d611-0a8b-4d23-b47a-d25d71403aaf,
5=eb073807-4edc-4753-a64a-a323a317ea2f},
targets={1=bc497718-d822-4592-b406-ef32d3173cd9,
2=2f307ebf-3bf3-412c-aaac-675718da3beb} after 372 ms
org.apache.hadoop.ozone.client.io.InsufficientLocationsException: There are
insufficient datanodes to read the EC block
at
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.init(ECBlockReconstructedStripeInputStream.java:224)
at
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.read(ECBlockReconstructedStripeInputStream.java:382)
at
org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.recoverChunks(ECBlockReconstructedStripeInputStream.java:331)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:288)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:170)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
at
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]