[ https://issues.apache.org/jira/browse/SPARK-26396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725393#comment-16725393 ]
Gabor Somogyi commented on SPARK-26396: --------------------------------------- Should be. Related the jira the described use-case is not how it should be designed. Unless you've further issues I would like to close it with info provided. > Kafka consumer cache overflow since 2.4.x > ----------------------------------------- > > Key: SPARK-26396 > URL: https://issues.apache.org/jira/browse/SPARK-26396 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.4.0 > Environment: Spark 2.4 standalone client mode > Reporter: Kaspar Tint > Priority: Major > > We are experiencing an issue where the Kafka consumer cache seems to overflow > constantly upon starting the application. This issue appeared after upgrading > to Spark 2.4. > We would get constant warnings like this: > {code:java} > 18/12/18 07:03:29 WARN KafkaDataConsumer: KafkaConsumer cache hitting max > capacity of 180, removing consumer for > CacheKey(spark-kafka-source-6f66e0d2-beaf-4ff2-ade8-8996611de6ae--1081651087-executor,kafka-topic-76) > 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max > capacity of 180, removing consumer for > CacheKey(spark-kafka-source-6f66e0d2-beaf-4ff2-ade8-8996611de6ae--1081651087-executor,kafka-topic-30) > 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max > capacity of 180, removing consumer for > CacheKey(spark-kafka-source-f41d1f9e-1700-4994-9d26-2b9c0ee57881--215746753-executor,kafka-topic-57) > 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max > capacity of 180, removing consumer for > CacheKey(spark-kafka-source-f41d1f9e-1700-4994-9d26-2b9c0ee57881--215746753-executor,kafka-topic-43) > {code} > This application is running 4 different Spark Structured Streaming queries > against the same Kafka topic that has 90 partitions. We used to run it with > just the default settings so it defaulted to cache size 64 on Spark 2.3 but > now we tried to put it to 180 or 360. With 360 we will have a lot less noise > about the overflow but resource need will increase substantially. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org