[ 
https://issues.apache.org/jira/browse/KAFKA-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036036#comment-18036036
 ] 

Kirk True commented on KAFKA-15402:
-----------------------------------

This appears to be a case of head-of-line blocking for {{FETCH}} request 
processing on the broker.

I added code in the consumer ({{{}AbstractFetch.createFetchRequest(){}}}) that 
forces the max wait value to 0 in the outgoing {{FETCH}} request when the fetch 
session epoch is set to {{{}FINAL_EPOCH{}}}.

However, it doesn't fix the issue if there's already an inflight {{FETCH}} 
request with a higher max wait. The inflight {{FETCH}} request with a longer 
max wait blocks the {{FETCH}} request with the shorter max wait from processing.

Here's an example timeline of events for an integration test that uses the 
consumer with a {{fetch.max.wait.ms}} value of 500 (default):
 # Time 123: The test produces N records
 # Time 234: The test reads the records in a {{Consumer.poll()}} loop, sending 
{{FETCH}} requests 1-118
 # Time 379: The test confirms that all N records were consumed and exits the 
loop
 # Time 380: The test invokes {{Consumer.close()}} (this form uses a default 
close timeout of 30 seconds)
 # Time 381: The broker starts processing {{FETCH}} request 118 (with a 500 ms. 
wait)
 # Time 437: As part of its closing process, the consumer attempts to close the 
broker's fetch session cache entry by sending {{FETCH}} request 119 with the 
max wait forced to 0 ms (I changed this on my branch)
 # Time 879: Around ~500 ms after it was sent to the broker, the consumer 
receives the response for {{FETCH}} request 118
 # Time 880: The broker starts processing {{FETCH}} request 119 (with a 0 ms. 
wait)
 # Time 902: The consumer receives the {{FETCH}} response for 119
 # Time 915: {{Consumer.close()}} returns back to the test, having taken 
approximately 535 ms. to execute

> Performance regression on close consumer after upgrading to 3.5.0
> -----------------------------------------------------------------
>
>                 Key: KAFKA-15402
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15402
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 3.5.0, 3.5.1, 3.6.0
>            Reporter: Benoit Delbosc
>            Assignee: Kirk True
>            Priority: Major
>             Fix For: 4.2.0
>
>         Attachments: image-2023-08-24-18-51-21-720.png, 
> image-2023-08-24-18-51-57-435.png, image-2023-08-25-10-50-28-079.png
>
>
> Hi,
> After upgrading to Kafka client version 3.5.0, we have observed a significant 
> increase in the duration of our Java unit tests. These unit tests heavily 
> rely on the Kafka Admin, Producer, and Consumer API.
> When using Kafka server version 3.4.1, the duration of the unit tests 
> increased from 8 seconds (with Kafka client 3.4.1) to 18 seconds (with Kafka 
> client 3.5.0).
> Upgrading the Kafka server to 3.5.1 show similar results.
> I have come across the issue KAFKA-15178, which could be the culprit. I will 
> attempt to test the proposed patch.
> In the meantime, if you have any ideas that could help identify and address 
> the regression, please let me know.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to