[ 
https://issues.apache.org/jira/browse/SPARK-26396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725393#comment-16725393
 ] 

Gabor Somogyi commented on SPARK-26396:
---------------------------------------

Should be.

Related the jira the described use-case is not how it should be designed.
Unless you've further issues I would like to close it with info provided.

> Kafka consumer cache overflow since 2.4.x
> -----------------------------------------
>
>                 Key: SPARK-26396
>                 URL: https://issues.apache.org/jira/browse/SPARK-26396
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>         Environment: Spark 2.4 standalone client mode
>            Reporter: Kaspar Tint
>            Priority: Major
>
> We are experiencing an issue where the Kafka consumer cache seems to overflow 
> constantly upon starting the application. This issue appeared after upgrading 
> to Spark 2.4.
> We would get constant warnings like this:
> {code:java}
> 18/12/18 07:03:29 WARN KafkaDataConsumer: KafkaConsumer cache hitting max 
> capacity of 180, removing consumer for 
> CacheKey(spark-kafka-source-6f66e0d2-beaf-4ff2-ade8-8996611de6ae--1081651087-executor,kafka-topic-76)
> 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max 
> capacity of 180, removing consumer for 
> CacheKey(spark-kafka-source-6f66e0d2-beaf-4ff2-ade8-8996611de6ae--1081651087-executor,kafka-topic-30)
> 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max 
> capacity of 180, removing consumer for 
> CacheKey(spark-kafka-source-f41d1f9e-1700-4994-9d26-2b9c0ee57881--215746753-executor,kafka-topic-57)
> 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max 
> capacity of 180, removing consumer for 
> CacheKey(spark-kafka-source-f41d1f9e-1700-4994-9d26-2b9c0ee57881--215746753-executor,kafka-topic-43)
> {code}
> This application is running 4 different Spark Structured Streaming queries 
> against the same Kafka topic that has 90 partitions. We used to run it with 
> just the default settings so it defaulted to cache size 64 on Spark 2.3 but 
> now we tried to put it to 180 or 360. With 360 we will have a lot less noise 
> about the overflow but resource need will increase substantially.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to