Mark Gui created RATIS-1377:
-------------------------------
Summary: Ratis reserved space for storage dirs
Key: RATIS-1377
URL: https://issues.apache.org/jira/browse/RATIS-1377
Project: Ratis
Issue Type: Improvement
Reporter: Mark Gui
Assignee: Mark Gui
We are using ozone with ratis for our services and we hit an issue with disk
out of space. We checked the log and think that it is that ratis has run out of
space and ozone pipelines (raftgroups for ratis) created on the full disk are
not able to close because it has to take a final snapshot. short log appended
below.
```
2021-05-25 19:10:47,171
[492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater]
INFO
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
group-B26E6BC26E24: Taking a snapshot at:(t:5, i:419) file
/data1/ratis/c5a9bc6e-fee1-48a8-9100-b26e6bc26e24/sm/snapshot.5_419 2021-05-25
19:10:47,171
[492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
group-B26E6BC26E24: Failed to write snapshot at:(t:5, i:419) file
/data1/ratis/c5a9bc6e-fee1-48a8-9100-b26e6bc26e24/sm/snapshot.5_419 2021-05-25
19:10:47,171
[492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater]
ERROR org.apache.ratis.server.impl.StateMachineUpdater:
492bc1be-439e-45db-856f-2e58336e2528@group-B26E6BC26E24-StateMachineUpdater:
Failed to take snapshot java.io.IOException: No space left on device at
java.io.FileOutputStream.writeBytes(Native Method) at
java.io.FileOutputStream.write(FileOutputStream.java:326) at
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.doFlush(CodedOutputStream.java:3062)
at
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.flushIfNotAvailable(CodedOutputStream.java:3057)
at
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeUInt64NoTag(CodedOutputStream.java:2897)
at
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream.writeInt64NoTag(CodedOutputStream.java:414)
at
org.apache.ratis.thirdparty.com.google.protobuf.FieldSet.writeElementNoTag(FieldSet.java:657)
at
org.apache.ratis.thirdparty.com.google.protobuf.FieldSet.writeElement(FieldSet.java:634)
at
org.apache.ratis.thirdparty.com.google.protobuf.MapEntryLite.writeTo(MapEntryLite.java:110)
at
org.apache.ratis.thirdparty.com.google.protobuf.MapEntry.writeTo(MapEntry.java:154)
at
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessageNoTag(CodedOutputStream.java:2855)
at
org.apache.ratis.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeMessage(CodedOutputStream.java:2824)
at
org.apache.ratis.thirdparty.com.google.protobuf.GeneratedMessageV3.serializeMapTo(GeneratedMessageV3.java:3224)
at
org.apache.ratis.thirdparty.com.google.protobuf.GeneratedMessageV3.serializeLongMapTo(GeneratedMessageV3.java:3140)
at
org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos$Container2BCSIDMapProto.writeTo(ContainerProtos.java:14633)
at
org.apache.ratis.thirdparty.com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:83)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.persistContainerSet(ContainerStateMachine.java:270)
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:294)
at
org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:265)
at
org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:257)
at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:183)
at java.lang.Thread.run(Thread.java:748)
```
So I think as the consumer of the disk, ratis should be able to mange the
free/used space and have some guarantee that operations should not be partial
completed due to out of space. We may build a reserved space for each disk in
ratis and filter out disks which reach the defined threshold for new raftgroup
allocation. Although the problem we hit happened on ozone side, but as the
comsumer of the metadata disks, this should better be done in ratis.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)