[
https://issues.apache.org/jira/browse/HDDS-14246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-14246:
----------------------------------
Labels: pull-request-available (was: )
> Change fsync boundary for FilePerBlockStrategy to block level
> -------------------------------------------------------------
>
> Key: HDDS-14246
> URL: https://issues.apache.org/jira/browse/HDDS-14246
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
> Labels: pull-request-available
>
> Currently, datanode has an option to flush the write on chunk boundary
> (hdds.container.chunk.write.sync) which is disabled by default since it might
> affect the DN write throughput and latency. However, disabling this means
> that if the datanode machine is suddenly down (e.g. power failure, reaped by
> OOM killer), this might cause the file to have incomplete data even if
> PutBlock (write commit) is successful which violates our durability
> guarantee. Although PutBlock triggers FilePerBlockStrategy#finishWriteChunks
> which will trigger close (RandomAccessFile#close), the buffer cache might not
> be flushed yet since closing a file does not imply that the buffer cache for
> the file is flushed (see
> [https://man7.org/linux/man-pages/man2/close.2.html]). So there might be a
> chance where the user's key's block locations are committed, but the blocks
> do not exist in datanodes due to aforementioned failures.
> We might need to consider calling FileChannel#force on PutBlock instead of
> WriteChunk since the data is only visible for users when PutBlock returns
> successfully (the data is committed). Therefore, we can guarantee that the
> after user successfully uploaded the key, the data has been persistently
> stored in the leader and at least one follower promise to flush the data
> (MAJORITY_COMMITTED).
> This might still affect the write throughput and latency due to waiting for
> the buffer cached to be flushed to persistent storage (ssd or disk), but will
> increase our data durability guarantee (which should be our priority).
> Flushing the buffer cache might also reduce the memory usage of datanode.
> In the future, we should consider enabling hdds.container.chunk.write.sync by
> default.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]