[ 
https://issues.apache.org/jira/browse/KAFKA-18018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903601#comment-17903601
 ] 

Abhinav Dixit edited comment on KAFKA-18018 at 12/6/24 9:46 AM:
----------------------------------------------------------------

updating the findings - this issue is indeed a performance issue and not a 
reliability issue. The records do get consumed in between 5-7 minutes. 
Highlighting the performance bottlenecks - 
1. handleShareAcknowledge in KafkaApis is taking 21% of total time involved in 
the consumption (16 times total) and mergeBatches in PersisterStateManager is 
getting called 23% of total time involved in the consumption (17 times).

2. Interesting question could be, why is mergeBatches getting called more 
number of times than the total number of times a share acknowledge request came.

Also, the problem doesn't occur if we used 10 byte record size instead of 1KB 
record size, which was used in the above testing. With 10 byte record size, the 
consumption doesn't take more than 9 seconds.


was (Author: JIRAUSER303719):
updating the findings - this issue is indeed a performance issue and not a 
reliability issue. The records do get consumed in between 5-7 minutes. 
Highlighting the performance bottlenecks - 
1. handleShareAcknowledge in `KafkaApis` is taking 21% of total time involved 
in the consumption (16 times total) and mergeBatches in PersisterStateManager 
is getting called 23% of total time involved in the consumption (17 times).

2. Interesting question could be, why is mergeBatches getting called more 
number of times than the total number of times a share acknowledge request came.

Also, the problem doesn't occur if we used 10 byte record size instead of 1KB 
record size, which was used in the above testing. With 10 byte record size, the 
consumption doesn't take more than 9 seconds.

> Consumption degrading when using DefaultStatePersister for 1 million records 
> of size 1KB
> ----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-18018
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18018
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Abhinav Dixit
>            Assignee: Abhinav Dixit
>            Priority: Major
>
> I was running some performance tests and I couldn't consume 1,000,000 records 
> using 10 share consumers. I've narrowed down the issue to 
> {{DefaultStatePersister}} . When I run the same tests using 
> {{{}NoOpShareStatePersister{}}}, then I don't see the reliability issue.
> Steps to reproduce (could be maybe reproduced by other ways as well, but this 
> is how I noticed it consistently) - # Create a topic with a single partition
>  # Produce 1000 records into the topic and consume it with the help of 10 
> share consumers.
>  # Again produce 1000 records into the topic and consume it with the help of 
> 10 share consumers.
>  # Produce 1,000,000 records into the topic and consume it with the help of 
> 10 share consumers.
> You'll see the records not getting consumed even within 5 minutes (It 
> shouldn't take more than 5-6 seconds to consume 1,000,000 records with 10 
> share consumers, based on my past experience)
> PS - I haven't noticed any issues using console share consumers, so probably 
> an issue with scale



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to