[ 
https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905716#comment-15905716
 ] 

Shixiong Zhu edited comment on SPARK-18057 at 3/10/17 9:21 PM:
---------------------------------------------------------------

I did some investigation yesterday, and found one issue in 0.10.2.0:
https://issues.apache.org/jira/browse/KAFKA-4879 : KafkaConsumer.position may 
hang forever when deleting a topic

Our current tests will just hang forever due to KAFKA-4879. This prevents us 
from upgrading 0.10.2.0.

I also went through the Kafka tickets between 0.10.0.1 and 0.10.2.0. Let me try 
to summary the current situation:

The benefits of upgrading Kafka client to 0.10.2.0:
- Forward compatibility
- Reading topics from a timestamp
- The following bug fixes:

Issues that we already have workarounds:
https://issues.apache.org/jira/browse/KAFKA-4375 : Kafka consumer may swallow 
some interrupts meant for the calling thread
https://issues.apache.org/jira/browse/KAFKA-4387 : KafkaConsumer will enter an 
infinite loop if the polling thread is interrupted, and either commitSync or 
committed is called
https://issues.apache.org/jira/browse/KAFKA-4536 : Kafka clients throw 
NullPointerException on poll when delete the relative topic

Issues related to Kafka record compression
https://issues.apache.org/jira/browse/KAFKA-3937 : Kafka Clients Leak Native 
Memory For Longer Than Needed With Compressed Messages
https://issues.apache.org/jira/browse/KAFKA-4549 : KafkaLZ4OutputStream does 
not write EndMark if flush() is not called before close()

Others:
https://issues.apache.org/jira/browse/KAFKA-2948 : Kafka producer does not cope 
well with topic deletions

For 0.10.1.x, KAFKA-4547 prevents us from upgrading to 0.10.1.x.

At last, IMO, "Reading topics from a timestamp" is pretty useful and is the 
most important reason that we should upgrade Kafka. However, since the Spark 
2.2 code freeze is coming, we won't get enough time to deliver this feature to 
the user, it's fine to just wait for them fixing KAFKA-4879 in the next Kafka 
release. I don't think the next Kafka release will be later than Spark 2.3.



was (Author: zsxwing):
I did some investigation yesterday, and found one issue in 0.10.2.0:
https://issues.apache.org/jira/browse/KAFKA-4879 : KafkaConsumer.position may 
hang forever when deleting a topic

Our current tests will just hang forever due to KAFKA-4879. This prevents us 
from upgrading 0.10.2.0.

I also went through the Kafka tickets between 0.10.0.1 and 0.10.2.0. Let me try 
to summary the current situation:

The benefits of upgrading Kafka client to 0.10.2.0:
- Forward compatibility
- Reading topics from a timestamp
- The following bug fixes:

Issues that we already have workarounds:
https://issues.apache.org/jira/browse/KAFKA-4375 : Kafka consumer may swallow 
some interrupts meant for the calling thread
https://issues.apache.org/jira/browse/KAFKA-4387 : KafkaConsumer will enter an 
infinite loop if the polling thread is interrupted, and either commitSync or 
committed is called
https://issues.apache.org/jira/browse/KAFKA-4536 : Kafka clients throw 
NullPointerException on poll when delete the relative topic

Issues related to Kafka record compression
https://issues.apache.org/jira/browse/KAFKA-3937 : Kafka Clients Leak Native 
Memory For Longer Than Needed With Compressed Messages
https://issues.apache.org/jira/browse/KAFKA-4549 : KafkaLZ4OutputStream does 
not write EndMark if flush() is not called before close()

Others:
https://issues.apache.org/jira/browse/KAFKA-2948 : Kafka producer does not cope 
well with topic deletions

For 0.10.1.*, KAFKA-4547 prevents us from upgrading to 0.10.1.*.

At last, IMO, "Reading topics from a timestamp" is pretty useful and is the 
most important reason that we should upgrade Kafka. However, since the Spark 
2.2 code freeze is coming, we won't get enough time to deliver this feature to 
the user, it's fine to just wait for them fixing KAFKA-4879 in the next Kafka 
release. I don't think the next Kafka release will be later than Spark 2.3.


> Update structured streaming kafka from 10.0.1 to 10.2.0
> -------------------------------------------------------
>
>                 Key: SPARK-18057
>                 URL: https://issues.apache.org/jira/browse/SPARK-18057
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>            Reporter: Cody Koeninger
>
> There are a couple of relevant KIPs here, 
> https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to