[jira] [Updated] (RATIS-1377) Ratis reserved space for storage dirs

Mark Gui (Jira) Mon, 31 May 2021 01:56:04 -0700


     [ 
https://issues.apache.org/jira/browse/RATIS-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mark Gui updated RATIS-1377:
----------------------------
    Description: 
We are using ozone with ratis for our services and we hit an issue with disk 
out of space. We checked the log and think that it is that ratis has run out of 
space and ozone pipelines (raftgroups for ratis) created on the full disk are 
not able to close because it has to take a final snapshot. short log appended 
below.
{code:java}
2021-05-25 19:10:47,171 
[492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater] 
INFO 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
 group-B26E6BC26E24: Taking a snapshot at:(t:5, i:419) file 
/data1/ratis/c5a9bc6e-fee1-48a8-9100-b26e6bc26e24/sm/snapshot.5_419 2021-05-25 
19:10:47,171 
[492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater] 
ERROR 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
 group-B26E6BC26E24: Failed to write snapshot at:(t:5, i:419) file 
/data1/ratis/c5a9bc6e-fee1-48a8-9100-b26e6bc26e24/sm/snapshot.5_419 2021-05-25 
19:10:47,171 
[492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater] 
ERROR org.apache.ratis.server.impl.StateMachineUpdater: 
492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater: 
Failed to take snapshot java.io.IOException: No space left on device at 
java.io.FileOutputStream.writeBytes(Native Method) at 
java.io.FileOutputStream.write(FileOutputStream.java:326) at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.doFlush(CodedOutputStream.java:3062)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.flushIfNotAvailable(CodedOutputStream.java:3057)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeUInt64NoTag(CodedOutputStream.java:2897)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream.writeInt64NoTag(CodedOutputStream.java:414)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.FieldSet.writeElementNoTag(FieldSet.java:657)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.FieldSet.writeElement(FieldSet.java:634)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.MapEntryLite.writeTo(MapEntryLite.java:110)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.MapEntry.writeTo(MapEntry.java:154)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessageNoTag(CodedOutputStream.java:2855)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessage(CodedOutputStream.java:2824)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.GeneratedMessageV3.serializeMapTo(GeneratedMessageV3.java:3224)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.GeneratedMessageV3.serializeLongMapTo(GeneratedMessageV3.java:3140)
 at 
org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$Container2BCSIDMapProto.writeTo(ContainerProtos.java:14633)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:83)
 at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.persistContainerSet(ContainerStateMachine.java:270)
 at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:294)
 at 
org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:265)
 at 
org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:257)
 at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:183)
 at java.lang.Thread.run(Thread.java:748)
{code}
So I think as the consumer of the disk, ratis should be able to mange the 
free/used space and have some guarantee that operations should not be partial 
completed due to out of space. We may build a reserved space for each disk in 
ratis and filter out disks which reach the defined threshold for new raftgroup 
allocation. Although the problem we hit happened on ozone side, but as the 
comsumer of the metadata disks, this should better be done in ratis.

  was:
We are using ozone with ratis for our services and we hit an issue with disk 
out of space. We checked the log and think that it is that ratis has run out of 
space and ozone pipelines (raftgroups for ratis) created on the full disk are 
not able to close because it has to take a final snapshot. short log appended 
below.

```

2021-05-25 19:10:47,171 
[492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater] 
INFO 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
 group-B26E6BC26E24: Taking a snapshot at:(t:5, i:419) file 
/data1/ratis/c5a9bc6e-fee1-48a8-9100-b26e6bc26e24/sm/snapshot.5_419 2021-05-25 
19:10:47,171 
[492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater] 
ERROR 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
 group-B26E6BC26E24: Failed to write snapshot at:(t:5, i:419) file 
/data1/ratis/c5a9bc6e-fee1-48a8-9100-b26e6bc26e24/sm/snapshot.5_419 2021-05-25 
19:10:47,171 
[492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater] 
ERROR org.apache.ratis.server.impl.StateMachineUpdater: 
492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater: 
Failed to take snapshot java.io.IOException: No space left on device at 
java.io.FileOutputStream.writeBytes(Native Method) at 
java.io.FileOutputStream.write(FileOutputStream.java:326) at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.doFlush(CodedOutputStream.java:3062)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.flushIfNotAvailable(CodedOutputStream.java:3057)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeUInt64NoTag(CodedOutputStream.java:2897)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream.writeInt64NoTag(CodedOutputStream.java:414)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.FieldSet.writeElementNoTag(FieldSet.java:657)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.FieldSet.writeElement(FieldSet.java:634)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.MapEntryLite.writeTo(MapEntryLite.java:110)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.MapEntry.writeTo(MapEntry.java:154)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessageNoTag(CodedOutputStream.java:2855)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessage(CodedOutputStream.java:2824)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.GeneratedMessageV3.serializeMapTo(GeneratedMessageV3.java:3224)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.GeneratedMessageV3.serializeLongMapTo(GeneratedMessageV3.java:3140)
 at 
org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$Container2BCSIDMapProto.writeTo(ContainerProtos.java:14633)
 at 
org.apache.ratis.thirdparty.com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:83)
 at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.persistContainerSet(ContainerStateMachine.java:270)
 at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:294)
 at 
org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:265)
 at 
org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:257)
 at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:183)
 at java.lang.Thread.run(Thread.java:748)

```

So I think as the consumer of the disk, ratis should be able to mange the 
free/used space and have some guarantee that operations should not be partial 
completed due to out of space. We may build a reserved space for each disk in 
ratis and filter out disks which reach the defined threshold for new raftgroup 
allocation. Although the problem we hit happened on ozone side, but as the 
comsumer of the metadata disks, this should better be done in ratis.


> Ratis reserved space for storage dirs
> -------------------------------------
>
>                 Key: RATIS-1377
>                 URL: https://issues.apache.org/jira/browse/RATIS-1377
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>
> We are using ozone with ratis for our services and we hit an issue with disk 
> out of space. We checked the log and think that it is that ratis has run out 
> of space and ozone pipelines (raftgroups for ratis) created on the full disk 
> are not able to close because it has to take a final snapshot. short log 
> appended below.
> {code:java}
> 2021-05-25 19:10:47,171 
> [492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater] 
> INFO 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
>  group-B26E6BC26E24: Taking a snapshot at:(t:5, i:419) file 
> /data1/ratis/c5a9bc6e-fee1-48a8-9100-b26e6bc26e24/sm/snapshot.5_419 
> 2021-05-25 19:10:47,171 
> [492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater] 
> ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
>  group-B26E6BC26E24: Failed to write snapshot at:(t:5, i:419) file 
> /data1/ratis/c5a9bc6e-fee1-48a8-9100-b26e6bc26e24/sm/snapshot.5_419 
> 2021-05-25 19:10:47,171 
> [492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater] 
> ERROR org.apache.ratis.server.impl.StateMachineUpdater: 
> 492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater: 
> Failed to take snapshot java.io.IOException: No space left on device at 
> java.io.FileOutputStream.writeBytes(Native Method) at 
> java.io.FileOutputStream.write(FileOutputStream.java:326) at 
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.doFlush(CodedOutputStream.java:3062)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.flushIfNotAvailable(CodedOutputStream.java:3057)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeUInt64NoTag(CodedOutputStream.java:2897)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream.writeInt64NoTag(CodedOutputStream.java:414)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.FieldSet.writeElementNoTag(FieldSet.java:657)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.FieldSet.writeElement(FieldSet.java:634)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.MapEntryLite.writeTo(MapEntryLite.java:110)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.MapEntry.writeTo(MapEntry.java:154)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessageNoTag(CodedOutputStream.java:2855)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessage(CodedOutputStream.java:2824)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.GeneratedMessageV3.serializeMapTo(GeneratedMessageV3.java:3224)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.GeneratedMessageV3.serializeLongMapTo(GeneratedMessageV3.java:3140)
>  at 
> org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$Container2BCSIDMapProto.writeTo(ContainerProtos.java:14633)
>  at 
> org.apache.ratis.thirdparty.com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:83)
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.persistContainerSet(ContainerStateMachine.java:270)
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:294)
>  at 
> org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:265)
>  at 
> org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:257)
>  at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:183)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> So I think as the consumer of the disk, ratis should be able to mange the 
> free/used space and have some guarantee that operations should not be partial 
> completed due to out of space. We may build a reserved space for each disk in 
> ratis and filter out disks which reach the defined threshold for new 
> raftgroup allocation. Although the problem we hit happened on ozone side, but 
> as the comsumer of the metadata disks, this should better be done in ratis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (RATIS-1377) Ratis reserved space for storage dirs

Reply via email to