[ https://issues.apache.org/jira/browse/SPARK-25106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591278#comment-16591278 ]
Jungtaek Lim commented on SPARK-25106: -------------------------------------- I played with the project and looks like it is affected by [SPARK-24987|https://github.com/apache/spark/commit/b7fdf8eb2011ae76f0161caa9da91e29f52f05e4] (It will be available in 2.3.2). Ran Consumer with attaching file leak detector ([http://file-leak-detector.kohsuke.org/]) for both 2.3.1 and 2.4.0-SNAPSHOT (SPARK-24987 is pulled), and 2.4.0-SNAPSHOT doesn't have this issue while I can reproduce this from 2.3.1. [~aseigneurin] Would you mind if you test again with either pull Spark 2.3.2 RC5 (in Spark dev mailing list) or build latest branch-2.3 of Spark source code, and see whether the issue is resolved or not? > A new Kafka consumer gets created for every batch > ------------------------------------------------- > > Key: SPARK-25106 > URL: https://issues.apache.org/jira/browse/SPARK-25106 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.3.1 > Reporter: Alexis Seigneurin > Priority: Major > Attachments: console.txt > > > I have a fairly simple piece of code that reads from Kafka, applies some > transformations - including applying a UDF - and writes the result to the > console. Every time a batch is created, a new consumer is created (and not > closed), eventually leading to a "too many open files" error. > I created a test case, with the code available here: > [https://github.com/aseigneurin/spark-kafka-issue] > To reproduce: > # Start Kafka and create a topic called "persons" > # Run "Producer" to generate data > # Run "Consumer" > I am attaching the log where you can see a new consumer being initialized > between every batch. > Please note this issue does *not* appear with Spark 2.2.2, and it does not > appear either when I don't apply the UDF. > I am suspecting - although I did go far enough to confirm - that this issue > is related to the improvement made in SPARK-23623. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org