ctrlaltluc commented on PR #14543: URL: https://github.com/apache/kafka/pull/14543#issuecomment-1761367892
> The fix in this PR has serious performance impact since partition lock is the bottleneck for single partition throughput in Kafka, hence, this decision is not lightly made. > > To understand eh problem correctly, in terms of concurrency, > > 1\ if renaming happens before flushing, then flush will fail will file not found (because it has reference to old directory). The renamed directory will not be flushed here but will eventually be flushed in the next scheduled flush() call. > > 2\ If renames happens after flushing then, we might have a renamed folder which hasn't been flushed yet. It will be flushed in next flush() call. > > @ctrlaltluc Is your primary concern that the "eventual" flush() of renamed directory will decrease durability since the messages will be lost if broker fails? @divijvaidya your understanding is correct. My primary concern is that, if the directory flush is ignored, if the broker fails until the next flush, any new segment is lost. Flushing the directory is required for synchronizing the directory inode, which contains the reference to the new segment inode. Only flushing the segment would only sync the new segment data and inode, but the directory inode would not have any reference to it (thus would be inaccessible). This is my understanding (which sounds correct to me) from explanation in db3e5e2c0de367ffcfe4078359d6d208ba722581. The edge case described there can still happen, if we wait until the next flush. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org