[
https://issues.apache.org/jira/browse/KAFKA-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282971#comment-17282971
]
Sagar Rao commented on KAFKA-10526:
-----------------------------------
[~hachikuji], I have looked at the codebase and also the KIP-595 and tried to
understand this.
One thing that I want to know is that the log replication happens via the Fetch
request/response dance.
So, the leader gets a Fetch request and if all pre conditions are met, finds a
bunch of records and returns a FetchResponse. During that process, it keeps
updating its LocalState and the replicated state for each Fetch Request that
comes through. In that process, it also tries to check if the highwatermark can
be moved ahead as it tries to find if a majority of followers are at a point >
current HMW offset.
The follower, when it receives the FetchResponse, looks at the messages and see
if it needs to truncate its log or if the LEader has been fenced etc and then
finally, writes the records passed in the FetchResponse to its log.
What I am not able to figure out is that how does the leader know that a write
has been committed on the follower side. I could find the code to check if the
HWM should be incremented or not based upon the ReplicaState.
The FetchResponse handler finally returns if the fetch was successful or not,
but how is the value propagated back to the leader? There are some listener
contexts, is it through that or via the NetworkChannels? I see a correlation id
which is being used in the Raftinbound messages as well.
In terms of the optimisation that you have suggested, instead of updating
LocalState/ReplicaState every time as the leader receives each FetchRequest it
can wait if the majority has committed the writes and flush only then. Is that
the correct understanding?
> Explore performance impact of leader fsync deferral
> ---------------------------------------------------
>
> Key: KAFKA-10526
> URL: https://issues.apache.org/jira/browse/KAFKA-10526
> Project: Kafka
> Issue Type: Sub-task
> Reporter: Jason Gustafson
> Assignee: Sagar Rao
> Priority: Major
>
> In order to commit a write, a majority of nodes must call fsync in order to
> ensure the data has been written to disk. An interesting optimization option
> to consider is letting the leader defer fsync until the high watermark is
> ready to be advanced. This potentially allows us to reduce the number of
> flushes on the leader.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)