adoroszlai opened a new pull request, #4719:
URL: https://github.com/apache/ozone/pull/4719
## What changes were proposed in this pull request?
* Log at info-level when EC reconstruction is started. Update existing
messages on completion/failure to be similar.
* Add debug-level message for container create/close commands.
https://issues.apache.org/jira/browse/HDDS-7080
## How was this patch tested?
Ran `TestECContainerRecovery` locally, checked output.
```
2023-05-16 11:45:57,038 [ContainerReplicationThread-0] INFO
reconstruction.ECReconstructionCoordinatorTask
(ECReconstructionCoordinatorTask.java:runTask(65)) - IN_PROGRESS
reconstructECContainersCommand: containerID=1, replication=rs-3-2-1024k,
missingIndexes=[1],
sources={2=4f8c1ee8-843d-4e20-a85d-84a8bafed0a1(localhost/127.0.0.1),
3=ef236338-4845-41a8-aac7-e4a6b965d1de(localhost/127.0.0.1),
4=954c0e80-343e-491f-8fd9-01a4f3fbc54a(localhost/127.0.0.1),
5=4f1313dc-72fe-469f-b9c7-97ffc1f000ae(localhost/127.0.0.1)},
targets={1=bcf6c97b-1dba-46e8-b7da-8fd5295ca1c7(localhost/127.0.0.1)}
2023-05-16 11:45:57,181 [ContainerReplicationThread-0] INFO
reconstruction.ECReconstructionCoordinatorTask
(ECReconstructionCoordinatorTask.java:runTask(75)) - DONE
reconstructECContainersCommand: containerID=1, replication=rs-3-2-1024k,
missingIndexes=[1],
sources={2=4f8c1ee8-843d-4e20-a85d-84a8bafed0a1(localhost/127.0.0.1),
3=ef236338-4845-41a8-aac7-e4a6b965d1de(localhost/127.0.0.1),
4=954c0e80-343e-491f-8fd9-01a4f3fbc54a(localhost/127.0.0.1),
5=4f1313dc-72fe-469f-b9c7-97ffc1f000ae(localhost/127.0.0.1)},
targets={1=bcf6c97b-1dba-46e8-b7da-8fd5295ca1c7(localhost/127.0.0.1)} in 143 ms
2023-05-16 11:46:36,634 [ContainerReplicationThread-0] INFO
reconstruction.ECReconstructionCoordinatorTask
(ECReconstructionCoordinatorTask.java:runTask(65)) - IN_PROGRESS
reconstructECContainersCommand: containerID=2, replication=rs-3-2-1024k,
missingIndexes=[1],
sources={2=4f1313dc-72fe-469f-b9c7-97ffc1f000ae(localhost/127.0.0.1),
3=66abd4c3-c150-40f4-9c64-748ed52588f8(localhost/127.0.0.1),
4=21cc8efd-52be-41fb-89ae-fc02f677a135(localhost/127.0.0.1),
5=7ea5f635-94c5-4b17-863e-2be9fa008825(localhost/127.0.0.1)},
targets={1=ef236338-4845-41a8-aac7-e4a6b965d1de(localhost/127.0.0.1)}
2023-05-16 11:46:39,831 [ContainerReplicationThread-0] WARN
reconstruction.ECReconstructionCoordinatorTask
(ECReconstructionCoordinatorTask.java:runTask(79)) - FAILED
reconstructECContainersCommand: containerID=2, replication=rs-3-2-1024k,
missingIndexes=[1],
sources={2=4f1313dc-72fe-469f-b9c7-97ffc1f000ae(localhost/127.0.0.1),
3=66abd4c3-c150-40f4-9c64-748ed52588f8(localhost/127.0.0.1),
4=21cc8efd-52be-41fb-89ae-fc02f677a135(localhost/127.0.0.1),
5=7ea5f635-94c5-4b17-863e-2be9fa008825(localhost/127.0.0.1)},
targets={1=ef236338-4845-41a8-aac7-e4a6b965d1de(localhost/127.0.0.1)} after
3198 ms
java.io.IOException: Chunk write failed at the new target node:
ef236338-4845-41a8-aac7-e4a6b965d1de(localhost/127.0.0.1). Aborting the
reconstruction process.
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.checkFailures(ECReconstructionCoordinator.java:333)
at
org.apache.hadoop.ozone.container.TestECContainerRecovery.lambda$testECContainerRecoveryWithTimedOutRecovery$1(TestECContainerRecovery.java:321)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:232)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:171)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:141)
at
org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
at
org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:348)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Unexpected Storage Container Exception:
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
Requested operation not allowed as ContainerState is UNHEALTHY
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.setIoException(BlockOutputStream.java:632)
at
org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.validateResponse(ECBlockOutputStream.java:303)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$2(BlockOutputStream.java:714)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
... 3 more
Caused by:
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
Requested operation not allowed as ContainerState is UNHEALTHY
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:718)
at
org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.validateResponse(ECBlockOutputStream.java:301)
... 7 more
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]