ocadaruma commented on PR #14543:
URL: https://github.com/apache/kafka/pull/14543#issuecomment-1761436502

   @ctrlaltluc 
   As @divijvaidya pointed out, flushing (i.e. calling `fsync`) under the 
UnifiedLog#lock could be a serious performance issue especially when disk's 
latency is high (e.g. using HDD or disk is overloaded) which several patches 
are proposed regarding this (#13782, #14242)
   
   > if the broker fails until the next flush
   
   To be precise, the condition of data loss is "broker server fails (≠ not 
process) at OS/Hardware level until the change is written to the device by OS", 
which is considered to be fairly rare if we deploy Kafka cluster properly (i.e. 
locate replicas in different failure domains).
   
   Also, even if we flush the directory, unless we flush the segment on every 
message append (which is not a common practice in Kafka), data-loss still could 
happen on server failure so relying on replication for data durability rather 
than fsync is the Kafka's design decision in my understanding. (As [Jack 
Vanlightly](https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-doesnt-need-fsync-to-be-safe)
 recently summarized).
   
   Given that, I'm not sure if we should fsync inside the lock at the cost of 
performance impact.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to