[
https://issues.apache.org/jira/browse/HDDS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506480#comment-17506480
]
Shawn commented on HDDS-6441:
-----------------------------
# unfortunately, we lost the logs prior to this restart. But here are the
errors l see in the current logs which I remember we have a lot of errors like
this before this restart:
```
2022-03-13 00:23:01 INFO ChunkWriter-0-0 KeyValueHandler:91 - Operation:
CreateContainer , Trace ID: , Message: Container creation failed, due to disk
out of space , Result: DISK_OUT_OF_SPACE , StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
Container creation failed, due to disk out of space
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:159)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:273)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:190)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:178)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:424)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:258)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:169)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:168)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:401)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:411)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:443)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of
space: The volume with the most available space (=403001344 B) is less than the
container size (=5368709120 B).
at
org.apache.hadoop.ozone.container.common.volume.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:77)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:115)
... 15 more
2022-03-13 00:23:01 INFO ChunkWriter-0-0 HddsDispatcher:91 - Operation:
WriteChunk , Trace ID: , Message: ContainerID 21010 creation failed , Result:
DISK_OUT_OF_SPACE , StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
ContainerID 21010 creation failed
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:262)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:169)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:168)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:401)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:411)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:443)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2022-03-13 00:23:01 ERROR ChunkWriter-0-0 ContainerStateMachine:471 -
group-EA22BAC07B66: writeChunk writeStateMachineData failed:
blockIdcontainerID: 21010
localID: 109611004725734010
blockCommitSequenceId: 0
logIndex 1 chunkName 109611004725734010_chunk_1 Error message: ContainerID
21010 creation failed Container Result: DISK_OUT_OF_SPACE
2022-03-13 00:23:01 INFO ChunkWriter-0-0 KeyValueHandler:91 - Operation:
CreateContainer , Trace ID: , Message: Container creation failed, due to disk
out of space , Result: DISK_OUT_OF_SPACE , StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
Container creation failed, due to disk out of space
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:159)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:273)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:190)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:178)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:424)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:258)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:169)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:168)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:401)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:411)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:443)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of
space: The volume with the most available space (=403001344 B) is less than the
container size (=5368709120 B).
at
org.apache.hadoop.ozone.container.common.volume.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:77)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:115)
... 15 more
2022-03-13 00:23:01 INFO ChunkWriter-0-0 HddsDispatcher:91 - Operation:
WriteChunk , Trace ID: , Message: ContainerID 21010 creation failed , Result:
DISK_OUT_OF_SPACE , StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
ContainerID 21010 creation failed
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:262)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:169)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:168)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:401)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:411)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:443)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2022-03-13 00:23:01 ERROR ChunkWriter-0-0 ContainerStateMachine:471 -
group-EA22BAC07B66: writeChunk writeStateMachineData failed:
blockIdcontainerID: 21010
```
> Ozone metadata does not align with underlying blocks when there are many
> incomplete uploads happens
> ---------------------------------------------------------------------------------------------------
>
> Key: HDDS-6441
> URL: https://issues.apache.org/jira/browse/HDDS-6441
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Datanode
> Affects Versions: 1.2.0
> Reporter: Shawn
> Assignee: Ethan Rose
> Priority: Major
>
> Ozone metadata does not align with underlying blocks when there are many
> incomplete uploads happens. I have a cluster which has a very few objects.
> But the datanode usage tells me I almost run out of space.
> ????
> Usage info for datanode with UUID f50108f1-d8bf-44e3-abed-6e77c91f994d:
> Capacity : 8802545958912B
> SCMUsed : 8802128257024B (99.99525%)
> Remaining : 74715136B (0.00085%)
> Usage info for datanode with UUID 2bdb3198-b71f-4153-9663-e3b349c6f82a:
> Capacity : 8802545958912B
> SCMUsed : 8802133102592B (99.99531%)
> Remaining : 76824576B (0.00087%)
> Usage info for datanode with UUID d5644a36-b967-44a6-a736-4bd2013c2b86:
> Capacity : 8793955991552B
> SCMUsed : 8793311227904B (99.99267%)
> Remaining : 291676160B (0.00332%)
> ...
>
> Also I see there are lots of errors in logs, complaining out of disk space
> and also report missing .container files as below:
> ????
> 2022-03-10 03:56:02 ERROR Thread-6 ContainerReader:159 - Missing .container
> file for ContainerID: 15221
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]