[ 
https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778224#comment-16778224
 ] 

Guozhang Wang commented on KAFKA-7652:
--------------------------------------

Oh that's bad news.. 

1) when you profile on latest trunk did you see the same pattern as observed in 
https://i.imgur.com/IHxC2cZ.png as well as in the trace logging compared with 
0.10.2.x?
2) practically the lookups in the caching layer is very cheap and hence even 
increased a lot it should not contribute to much overhead, whereas the fetches 
on the underlying store would be much more expensive. Could you confirm if the 
performance bottleneck is from the underlying rocksDB, or from the caching 
layer access?

> Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-7652
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7652
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, 
> 2.0.1
>            Reporter: Jonathan Gordon
>            Assignee: Guozhang Wang
>            Priority: Major
>              Labels: kip
>             Fix For: 2.2.0
>
>         Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt
>
>
> I'm creating this issue in response to [~guozhang]'s request on the mailing 
> list:
> [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]
> We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
> experience a severe performance degradation. The highest amount of CPU time 
> seems spent in retrieving from the local cache. Here's an example thread 
> profile with 0.11.0.0:
> [https://i.imgur.com/l5VEsC2.png]
> When things are running smoothly we're gated by retrieving from the state 
> store with acceptable performance. Here's an example thread profile with 
> 0.10.2.1:
> [https://i.imgur.com/IHxC2cZ.png]
> Some investigation reveals that it appears we're performing about 3 orders 
> magnitude more lookups on the NamedCache over a comparable time period. I've 
> attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
> We're using session windows and have the app configured for 
> commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
> I'm happy to share more details if they would be helpful. Also happy to run 
> tests on our data.
> I also found this issue, which seems like it may be related:
> https://issues.apache.org/jira/browse/KAFKA-4904
>  
> KIP-420: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to