[
https://issues.apache.org/jira/browse/HDDS-14246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-14246:
-------------------------------
Description:
Currently, datanode has an option to flush the write on chunk boundary
(hdds.container.chunk.write.sync) which is disabled by default since it might
affect the DN write throughput and latency. However, disabling this means that
if the datanode machine is suddenly down (e.g. power failure, reaped by OOM
killer), this might cause the file to have incomplete data even if PutBlock
(write commit) is successful. Although PutBlock triggers
FilePerBlockStrategy#finishWriteChunks which will trigger close
RandomAccessFile#close, the buffer cache might not be flushed yet since closing
a file does not mean that the buffer cache for the file is flushed (see
[https://man7.org/linux/man-pages/man2/close.2.html]). This means that there
might be mismatch between the block metadata and the actual data which can
cause data inconsistency and loss.
We might need to consider calling FileChannel#force on PutBlock instead of
WriteChunk since the data is only visible for users when it's committed (e.g.
PutBlock returns successfully). Therefore, we can guarantee that the after user
successfully uploaded the key, the data has been persistently stored in the
leader and at least one follower promise to flush the data (MAJORITY_COMMITTED).
This might still affect the write throughput and latency, but will increase our
data durability guarantee.
was:
Currently, datanode has an option to flush the write on chunk boundary
(hdds.container.chunk.write.sync) which is disabled by default since it might
affect the DN write throughput and latency. However, disabling this means that
if the datanode machine is suddenly down (e.g. power failure, reaped by OOM
killer), this might cause the file to have incomplete data even if PutBlock
(write commit) is successful. Although PutBlock triggers
FilePerBlockStrategy#finishWriteChunks which will trigger close
RandomAccessFile#close, the buffer cache might not be flushed yet (see
[https://man7.org/linux/man-pages/man2/close.2.html]). This means that there
might be mismatch between the block metadata and the actual data which can
cause data inconsistency and loss.
We might need to consider calling FileChannel#force on PutBlock instead of
WriteChunk since the data is only visible for users when it's committed (e.g.
PutBlock returns successfully). Therefore, we can guarantee that the after user
successfully uploaded the key, the data has been persistently stored in the
leader and at least one follower promise to flush the data (MAJORITY_COMMITTED).
This might still affect the write throughput and latency, but will increase our
data durability guarantee.
> Optionally sync write during PutBlock
> -------------------------------------
>
> Key: HDDS-14246
> URL: https://issues.apache.org/jira/browse/HDDS-14246
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> Currently, datanode has an option to flush the write on chunk boundary
> (hdds.container.chunk.write.sync) which is disabled by default since it might
> affect the DN write throughput and latency. However, disabling this means
> that if the datanode machine is suddenly down (e.g. power failure, reaped by
> OOM killer), this might cause the file to have incomplete data even if
> PutBlock (write commit) is successful. Although PutBlock triggers
> FilePerBlockStrategy#finishWriteChunks which will trigger close
> RandomAccessFile#close, the buffer cache might not be flushed yet since
> closing a file does not mean that the buffer cache for the file is flushed
> (see [https://man7.org/linux/man-pages/man2/close.2.html]). This means that
> there might be mismatch between the block metadata and the actual data which
> can cause data inconsistency and loss.
> We might need to consider calling FileChannel#force on PutBlock instead of
> WriteChunk since the data is only visible for users when it's committed (e.g.
> PutBlock returns successfully). Therefore, we can guarantee that the after
> user successfully uploaded the key, the data has been persistently stored in
> the leader and at least one follower promise to flush the data
> (MAJORITY_COMMITTED).
> This might still affect the write throughput and latency, but will increase
> our data durability guarantee.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]