[jira] [Updated] (HDDS-14246) Optionally sync write during PutBlock

Ivan Andika (Jira) Thu, 25 Dec 2025 23:09:04 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-14246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ivan Andika updated HDDS-14246:
-------------------------------
    Description: 
Currently, datanode has an option to flush the write on chunk boundary 
(hdds.container.chunk.write.sync) which is disabled by default since it might 
affect the DN write throughput and latency. However, disabling this means that 
if the datanode machine is suddenly down (e.g. power failure, reaped by OOM 
killer), this might cause the file to have incomplete data even if PutBlock 
(write commit) is successful. Although PutBlock triggers 
FilePerBlockStrategy#finishWriteChunks which will trigger close 
RandomAccessFile#close, the buffer cache might not be flushed yet since closing 
a file does not mean that the buffer cache for the file is flushed (see 
[https://man7.org/linux/man-pages/man2/close.2.html]). This means that there 
might be mismatch between the block metadata and the actual data which can 
cause data inconsistency and loss.

We might need to consider calling FileChannel#force on PutBlock instead of 
WriteChunk since the data is only visible for users when it's committed (e.g. 
PutBlock returns successfully). Therefore, we can guarantee that the after user 
successfully uploaded the key, the data has been persistently stored in the 
leader and at least one follower promise to flush the data (MAJORITY_COMMITTED).

This might still affect the write throughput and latency, but will increase our 
data durability guarantee.

 

  was:
Currently, datanode has an option to flush the write on chunk boundary 
(hdds.container.chunk.write.sync) which is disabled by default since it might 
affect the DN write throughput and latency. However, disabling this means that 
if the datanode machine is suddenly down (e.g. power failure, reaped by OOM 
killer), this might cause the file to have incomplete data even if PutBlock 
(write commit) is successful. Although PutBlock triggers 
FilePerBlockStrategy#finishWriteChunks which will trigger close 
RandomAccessFile#close, the buffer cache might not be flushed yet (see 
[https://man7.org/linux/man-pages/man2/close.2.html]). This means that there 
might be mismatch between the block metadata and the actual data which can 
cause data inconsistency and loss.

We might need to consider calling FileChannel#force on PutBlock instead of 
WriteChunk since the data is only visible for users when it's committed (e.g. 
PutBlock returns successfully). Therefore, we can guarantee that the after user 
successfully uploaded the key, the data has been persistently stored in the 
leader and at least one follower promise to flush the data (MAJORITY_COMMITTED).

This might still affect the write throughput and latency, but will increase our 
data durability guarantee.

 


> Optionally sync write during PutBlock
> -------------------------------------
>
>                 Key: HDDS-14246
>                 URL: https://issues.apache.org/jira/browse/HDDS-14246
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> Currently, datanode has an option to flush the write on chunk boundary 
> (hdds.container.chunk.write.sync) which is disabled by default since it might 
> affect the DN write throughput and latency. However, disabling this means 
> that if the datanode machine is suddenly down (e.g. power failure, reaped by 
> OOM killer), this might cause the file to have incomplete data even if 
> PutBlock (write commit) is successful. Although PutBlock triggers 
> FilePerBlockStrategy#finishWriteChunks which will trigger close 
> RandomAccessFile#close, the buffer cache might not be flushed yet since 
> closing a file does not mean that the buffer cache for the file is flushed 
> (see [https://man7.org/linux/man-pages/man2/close.2.html]). This means that 
> there might be mismatch between the block metadata and the actual data which 
> can cause data inconsistency and loss.
> We might need to consider calling FileChannel#force on PutBlock instead of 
> WriteChunk since the data is only visible for users when it's committed (e.g. 
> PutBlock returns successfully). Therefore, we can guarantee that the after 
> user successfully uploaded the key, the data has been persistently stored in 
> the leader and at least one follower promise to flush the data 
> (MAJORITY_COMMITTED).
> This might still affect the write throughput and latency, but will increase 
> our data durability guarantee.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-14246) Optionally sync write during PutBlock

Reply via email to