[ 
https://issues.apache.org/jira/browse/HDDS-14246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-14246:
-------------------------------
    Summary: Change fsync boundary for FilePerBlockStrategy to block level  
(was: Force fsync during PutBlock to prevent data loss)

> Change fsync boundary for FilePerBlockStrategy to block level
> -------------------------------------------------------------
>
>                 Key: HDDS-14246
>                 URL: https://issues.apache.org/jira/browse/HDDS-14246
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> Currently, datanode has an option to flush the write on chunk boundary 
> (hdds.container.chunk.write.sync) which is disabled by default since it might 
> affect the DN write throughput and latency. However, disabling this means 
> that if the datanode machine is suddenly down (e.g. power failure, reaped by 
> OOM killer), this might cause the file to have incomplete data even if 
> PutBlock (write commit) is successful which violates our durability 
> guarantee. Although PutBlock triggers FilePerBlockStrategy#finishWriteChunks 
> which will trigger close (RandomAccessFile#close), the buffer cache might not 
> be flushed yet since closing a file does not imply that the buffer cache for 
> the file is flushed (see 
> [https://man7.org/linux/man-pages/man2/close.2.html]). So there might be a 
> chance where the user's key is committed, but the data do not exist in 
> datanodes.
> We might need to consider calling FileChannel#force on PutBlock instead of 
> WriteChunk since the data is only visible for users when PutBlock returns 
> successfully (the data is committed). Therefore, we can guarantee that the 
> after user successfully uploaded the key, the data has been persistently 
> stored in the leader and at least one follower promise to flush the data 
> (MAJORITY_COMMITTED).
> This might still affect the write throughput and latency due to waiting for 
> the buffer cached to be flushed to persistent storage (ssd or disk), but will 
> increase our data durability guarantee (which should be our priority). 
> Flushing the buffer cache might also reduce the memory usage of datanode.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to