[jira] [Comment Edited] (KAFKA-10526) Explore performance impact of leader fsync deferral

Sagar Rao (Jira) Sat, 27 Feb 2021 00:16:06 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291485#comment-17291485
 ]


Sagar Rao edited comment on KAFKA-10526 at 2/27/21, 8:15 AM:
-------------------------------------------------------------

[~hachikuji], I looked at the codebase and the KIP further and here's what I 
understood:

1) Any new records that the leader receives, it immediately updates its local 
state. This happens via the maybeAppendBatches method which invokes 
flushLeaderLog. In flushLeaderLog, for the bunch of records, it would update 
it's local state and check if the HWM can be advanced. Note that after this 
step, the log is always flushed to disk in flushLeaderLog.

2) The followers invoke fetch requests to fetch records. Once the leader 
receives such a message, it invokes tryCompleteFetchRequest which validates the 
request. At this point, it reads a bunch of records which can be returned to 
the follower and it tries to update the replicaState. It also tries to update 
the HWM and if it does, then the HWM on the log is also advanced. 

3) The follower, when it receives a FetchResponse, appends the response to its 
log and also flushes the record to its log. I believe it also updates the 
follower watermark here.

 

So, in this flow, flush happens in 2 flows: 1) when the leader completes a 
batch and secondly, when a fetchresponse is received by the follower. As per 
the Op in the ticket, fsync is called a number of times on the followers, so 
that is the ls the latter. Few questions that I have:

 

1) Basic question, but I see all this logic in KafkaRaftClient. where does the 
instance of the class get instantiated? Is it on the leader?

2) looking at this flow, i am slightly confused on how does the leader know 
which records have been committed successfully on the followers? It seems to 
maintian a local copy of replicas and their offsets and epochs, but how does it 
know which have been committed? Is it via the fetch requests received from the 
followers?

3) The optimisation that you have talked about, where does that need to happen 
in this flow? Is it while handling fetch responses  or when appending new 
records in the batch? Or is it some other place?

4) There is a concept of committing of records in appendBatch method when 
invoked via maybeAppendBatches. appendBatch first writes to local leader log 
and then there's a callback to appendPurgatory which mentions about commit. I 
am assuming this commit is wrt the majority of nodes but I can't seem to find 
the place where it is being written to the followers and acks are being 
received from the majority. 

 

I know some of these questions are basic and point to my lack of understanding 
of the overall codebase but i thought i will just ask them here to get a full 
clarity.


was (Author: sagarrao):
[~hachikuji], I looked at the codebase and the KIP further and here's what I 
understood:

1) Any new records that the leader receives, it immediately updates its local 
state. This happens via the maybeAppendBatches method which invokes 
flushLeaderLog. In flushLeaderLog, for the bunch of records, it would update 
it's local state and check if the HWM can be advanced. Note that after this 
step, the log is always flushed to disk in flushLeaderLog.

2) The followers invoke fetch requests to fetch records. Once the leader 
receives such a message, it invokes tryCompleteFetchRequest which validates the 
request. At this point, it reads a bunch of records which can be returned to 
the follower and it tries to update the replicaState. It also tries to update 
the HWM and if it does, then the HWM on the log is also advanced. 

3) The follower, when it receives a FetchResponse, appends the response to its 
log and also flushes the record to its log. I believe it also updates the 
follower watermark here.

 

So, in this flow, flush happens in 2 flows: 1) when the leader completes a 
batch and secondly, when a fetchresponse is received by the follower. As per 
the Op in the ticket, fsync is called a number of times on the followers, so 
that is the ls the latter. Few questions that I have:

 

1) Basic question, but I see all this logic in KafkaRaftClient. where does the 
instance of the class get instantiated? Is it on the leader?

2) looking at this flow, i am slightly confused on how does the leader know 
which records have been committed successfully on the followers? It seems to 
maintian a local copy of replicas and their offsets and epochs, but how does it 
know which have been committed? Is it via the fetch requests received from the 
followers?

3) The optimisation that you have talked about, where does that need to happen 
in this flow? Is it while handling fetch responses  or when appending new 
records in the batch? Or is it some other place?

 

> Explore performance impact of leader fsync deferral
> ---------------------------------------------------
>
>                 Key: KAFKA-10526
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10526
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Jason Gustafson
>            Assignee: Sagar Rao
>            Priority: Major
>
> In order to commit a write, a majority of nodes must call fsync in order to 
> ensure the data has been written to disk. An interesting optimization option 
> to consider is letting the leader defer fsync until the high watermark is 
> ready to be advanced. This potentially allows us to reduce the number of 
> flushes on the leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (KAFKA-10526) Explore performance impact of leader fsync deferral

Reply via email to