[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905716#comment-15905716 ]
Shixiong Zhu edited comment on SPARK-18057 at 3/10/17 9:21 PM: --------------------------------------------------------------- I did some investigation yesterday, and found one issue in 0.10.2.0: https://issues.apache.org/jira/browse/KAFKA-4879 : KafkaConsumer.position may hang forever when deleting a topic Our current tests will just hang forever due to KAFKA-4879. This prevents us from upgrading 0.10.2.0. I also went through the Kafka tickets between 0.10.0.1 and 0.10.2.0. Let me try to summary the current situation: The benefits of upgrading Kafka client to 0.10.2.0: - Forward compatibility - Reading topics from a timestamp - The following bug fixes: Issues that we already have workarounds: https://issues.apache.org/jira/browse/KAFKA-4375 : Kafka consumer may swallow some interrupts meant for the calling thread https://issues.apache.org/jira/browse/KAFKA-4387 : KafkaConsumer will enter an infinite loop if the polling thread is interrupted, and either commitSync or committed is called https://issues.apache.org/jira/browse/KAFKA-4536 : Kafka clients throw NullPointerException on poll when delete the relative topic Issues related to Kafka record compression https://issues.apache.org/jira/browse/KAFKA-3937 : Kafka Clients Leak Native Memory For Longer Than Needed With Compressed Messages https://issues.apache.org/jira/browse/KAFKA-4549 : KafkaLZ4OutputStream does not write EndMark if flush() is not called before close() Others: https://issues.apache.org/jira/browse/KAFKA-2948 : Kafka producer does not cope well with topic deletions For 0.10.1.x, KAFKA-4547 prevents us from upgrading to 0.10.1.x. At last, IMO, "Reading topics from a timestamp" is pretty useful and is the most important reason that we should upgrade Kafka. However, since the Spark 2.2 code freeze is coming, we won't get enough time to deliver this feature to the user, it's fine to just wait for them fixing KAFKA-4879 in the next Kafka release. I don't think the next Kafka release will be later than Spark 2.3. was (Author: zsxwing): I did some investigation yesterday, and found one issue in 0.10.2.0: https://issues.apache.org/jira/browse/KAFKA-4879 : KafkaConsumer.position may hang forever when deleting a topic Our current tests will just hang forever due to KAFKA-4879. This prevents us from upgrading 0.10.2.0. I also went through the Kafka tickets between 0.10.0.1 and 0.10.2.0. Let me try to summary the current situation: The benefits of upgrading Kafka client to 0.10.2.0: - Forward compatibility - Reading topics from a timestamp - The following bug fixes: Issues that we already have workarounds: https://issues.apache.org/jira/browse/KAFKA-4375 : Kafka consumer may swallow some interrupts meant for the calling thread https://issues.apache.org/jira/browse/KAFKA-4387 : KafkaConsumer will enter an infinite loop if the polling thread is interrupted, and either commitSync or committed is called https://issues.apache.org/jira/browse/KAFKA-4536 : Kafka clients throw NullPointerException on poll when delete the relative topic Issues related to Kafka record compression https://issues.apache.org/jira/browse/KAFKA-3937 : Kafka Clients Leak Native Memory For Longer Than Needed With Compressed Messages https://issues.apache.org/jira/browse/KAFKA-4549 : KafkaLZ4OutputStream does not write EndMark if flush() is not called before close() Others: https://issues.apache.org/jira/browse/KAFKA-2948 : Kafka producer does not cope well with topic deletions For 0.10.1.*, KAFKA-4547 prevents us from upgrading to 0.10.1.*. At last, IMO, "Reading topics from a timestamp" is pretty useful and is the most important reason that we should upgrade Kafka. However, since the Spark 2.2 code freeze is coming, we won't get enough time to deliver this feature to the user, it's fine to just wait for them fixing KAFKA-4879 in the next Kafka release. I don't think the next Kafka release will be later than Spark 2.3. > Update structured streaming kafka from 10.0.1 to 10.2.0 > ------------------------------------------------------- > > Key: SPARK-18057 > URL: https://issues.apache.org/jira/browse/SPARK-18057 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Reporter: Cody Koeninger > > There are a couple of relevant KIPs here, > https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org