[
https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905716#comment-15905716
]
Shixiong Zhu commented on SPARK-18057:
--------------------------------------
I did some investigation yesterday, and found one issue in 0.10.2.0:
https://issues.apache.org/jira/browse/KAFKA-4879 : KafkaConsumer.position may
hang forever when deleting a topic
Our current tests will just hang forever due to KAFKA-4879. This prevents us
from upgrading 0.10.2.0.
I also went through the Kafka tickets between 0.10.0.1 and 0.10.2.0. Let me try
to summary the current situation:
The benefits of upgrading Kafka client to 0.10.2.0:
- Forward compatibility
- Reading topics from a timestamp
- The following bug fixes:
Issues that we already have workarounds:
https://issues.apache.org/jira/browse/KAFKA-4375 : Kafka consumer may swallow
some interrupts meant for the calling thread
https://issues.apache.org/jira/browse/KAFKA-4387 : KafkaConsumer will enter an
infinite loop if the polling thread is interrupted, and either commitSync or
committed is called
https://issues.apache.org/jira/browse/KAFKA-4536 : Kafka clients throw
NullPointerException on poll when delete the relative topic
Issues related to Kafka record compression
https://issues.apache.org/jira/browse/KAFKA-3937 : Kafka Clients Leak Native
Memory For Longer Than Needed With Compressed Messages
https://issues.apache.org/jira/browse/KAFKA-4549 : KafkaLZ4OutputStream does
not write EndMark if flush() is not called before close()
Others:
https://issues.apache.org/jira/browse/KAFKA-2948 : Kafka producer does not cope
well with topic deletions
For 0.10.1.*, KAFKA-4547 prevents us from upgrading to 0.10.1.*.
At last, IMO, "Reading topics from a timestamp" is pretty useful and is the
most important reason that we should upgrade Kafka. However, since the Spark
2.2 code freeze is coming, we won't get enough time to deliver this feature to
the user, it's fine to just wait for them fixing KAFKA-4879 in the next Kafka
release. I don't think the next Kafka release will be later than Spark 2.3.
> Update structured streaming kafka from 10.0.1 to 10.2.0
> -------------------------------------------------------
>
> Key: SPARK-18057
> URL: https://issues.apache.org/jira/browse/SPARK-18057
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Reporter: Cody Koeninger
>
> There are a couple of relevant KIPs here,
> https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]