[
https://issues.apache.org/jira/browse/HDDS-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell resolved HDDS-9319.
-------------------------------------
Resolution: Duplicate
> EC Reconstruction fails when chunk length is 0 bytes
> ----------------------------------------------------
>
> Key: HDDS-9319
> URL: https://issues.apache.org/jira/browse/HDDS-9319
> Project: Apache Ozone
> Issue Type: Bug
> Components: EC
> Reporter: Varsha Ravi
> Assignee: Stephen O'Donnell
> Priority: Major
>
> EC Offline reconstruction is failing with *java.io.IOException: Failed to get
> chunkInfo[123]* exception. The DN log shows there are insufficient datanodes
> to read the EC blocks even though there are 3 DNs up out of 5.
> This could be because the chunk length is zero bytes.
> *EC Policy:* rs-3-2-1024k
> *DN.log:*
> {noformat}
> 2023-09-15 13:47:57,844 ERROR
> [ec-reconstruct-reader-TID-0]-org.apache.hadoop.hdds.scm.XceiverClientGrpc:
> Failed to execute command GetBlock on the pipeline Pipeline[ Id:
> b3a4dcf4-916d-4a97-adde-daaf9785b237, Nodes:
> b3a4dcf4-916d-4a97-adde-daaf9785b237, ReplicationConfig: STANDALONE/ONE,
> State:CLOSED, leaderId:, CreationTimestamp2023-09-15T13:47:57.796097Z[UTC]].
> 2023-09-15 13:47:57,845 INFO
> [ContainerReplicationThread-0]-org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream:
> ECBlockReconstructedStripeInputStream{conID: 1012 locID:
> 111677748019201071}@740ac7aa: error reading [2], marked as failed
> org.apache.hadoop.ozone.client.io.BadDataLocationException:
> java.io.IOException: Failed to get chunkInfo[123]: len == 0
> at
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:633)
> at
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.lambda$loadDataBuffersFromStream$2(ECBlockReconstructedStripeInputStream.java:566)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.io.IOException: Failed to get chunkInfo[123]: len == 0
> at
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.validate(BlockInputStream.java:278)
> at
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.lambda$static$0(BlockInputStream.java:265)
> at
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:407)
> at
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:347)
> at
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:169)
> at
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
> at
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:342)
> at
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:323)
> at
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:208)
> at
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$getBlock$0(ContainerProtocolCalls.java:186)
> at
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:146)
> at
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:185)
> at
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:255)
> at
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:146)
> at
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:308)
> at
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:66)
> at
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readFromCurrentLocation(ECBlockReconstructedStripeInputStream.java:644)
> at
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.readIntoBuffer(ECBlockReconstructedStripeInputStream.java:620)
> ... 5 more
> 2023-09-15 13:47:57,864 WARN
> [ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
> Exception while reconstructing the container 1012. Cleaning up all the
> recovering containers in the reconstruction process.
> org.apache.hadoop.ozone.client.io.InsufficientLocationsException: There are
> insufficient datanodes to read the EC block
> at
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.init(ECBlockReconstructedStripeInputStream.java:224)
> at
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.read(ECBlockReconstructedStripeInputStream.java:382)
> at
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.recoverChunks(ECBlockReconstructedStripeInputStream.java:331)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:288)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:170)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
> at
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> 2023-09-15 13:47:57,916 INFO
> [ChunkReader-5]-org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer:
> Moving container
> /hadoop-ozone/datanode/data/hdds/CID-2e752843-8947-454d-992f-013b3824468b/current/containerDir1/1012
> to state DELETED from state:RECOVERING
> 2023-09-15 13:47:57,920 WARN
> [ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask:
> FAILED reconstructECContainersCommand: containerID=1012,
> replication=rs-3-2-1024k, missingIndexes=[1, 2],
> sources={3=b3a4dcf4-916d-4a97-adde-daaf9785b237,
> 4=4641d611-0a8b-4d23-b47a-d25d71403aaf,
> 5=eb073807-4edc-4753-a64a-a323a317ea2f},
> targets={1=bc497718-d822-4592-b406-ef32d3173cd9,
> 2=2f307ebf-3bf3-412c-aaac-675718da3beb} after 372 ms
> org.apache.hadoop.ozone.client.io.InsufficientLocationsException: There are
> insufficient datanodes to read the EC block
> at
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.init(ECBlockReconstructedStripeInputStream.java:224)
> at
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.read(ECBlockReconstructedStripeInputStream.java:382)
> at
> org.apache.hadoop.ozone.client.io.ECBlockReconstructedStripeInputStream.recoverChunks(ECBlockReconstructedStripeInputStream.java:331)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:288)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:170)
> at
> org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
> at
> org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834){noformat}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]