[ 
https://issues.apache.org/jira/browse/SPARK-26396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725050#comment-16725050
 ] 

Kaspar Tint edited comment on SPARK-26396 at 12/19/18 2:39 PM:
---------------------------------------------------------------

Any exact formula to use for this when considering that the application can 
have many different queries? We don't need that many executors in dev for 
instance but in production we indeed have plenty of them.


was (Author: tint):
Any exact formula to use for this when considering that the application can 
have many different queries? 

> Kafka consumer cache overflow since 2.4.x
> -----------------------------------------
>
>                 Key: SPARK-26396
>                 URL: https://issues.apache.org/jira/browse/SPARK-26396
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>         Environment: Spark 2.4 standalone client mode
>            Reporter: Kaspar Tint
>            Priority: Major
>
> We are experiencing an issue where the Kafka consumer cache seems to overflow 
> constantly upon starting the application. This issue appeared after upgrading 
> to Spark 2.4.
> We would get constant warnings like this:
> {code:java}
> 18/12/18 07:03:29 WARN KafkaDataConsumer: KafkaConsumer cache hitting max 
> capacity of 180, removing consumer for 
> CacheKey(spark-kafka-source-6f66e0d2-beaf-4ff2-ade8-8996611de6ae--1081651087-executor,kafka-topic-76)
> 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max 
> capacity of 180, removing consumer for 
> CacheKey(spark-kafka-source-6f66e0d2-beaf-4ff2-ade8-8996611de6ae--1081651087-executor,kafka-topic-30)
> 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max 
> capacity of 180, removing consumer for 
> CacheKey(spark-kafka-source-f41d1f9e-1700-4994-9d26-2b9c0ee57881--215746753-executor,kafka-topic-57)
> 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max 
> capacity of 180, removing consumer for 
> CacheKey(spark-kafka-source-f41d1f9e-1700-4994-9d26-2b9c0ee57881--215746753-executor,kafka-topic-43)
> {code}
> This application is running 4 different Spark Structured Streaming queries 
> against the same Kafka topic that has 90 partitions. We used to run it with 
> just the default settings so it defaulted to cache size 64 on Spark 2.3 but 
> now we tried to put it to 180 or 360. With 360 we will have a lot less noise 
> about the overflow but resource need will increase substantially.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to