[ 
https://issues.apache.org/jira/browse/SPARK-25005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637532#comment-16637532
 ] 

Quentin Ambard commented on SPARK-25005:
----------------------------------------

How do you make difference between data loss or data missing when .pool() 
doesn't return any value [~zsxwing] ? Correct me if I'm wrong but you could 
lose data in this situation no ?

I think there is a third case here 
[https://github.com/zsxwing/spark/blob/ea804cfe840196519cc9444be9bedf03d10aa11a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala#L474]
 which is : something went wrong, data is available in kafka but I failed to 
get it.
I've seen it happening when the max.pool size is big with big messages and the 
heap is getting full. Message exist but the jvm lags and the consumer timeout 
before getting the messages

> Structured streaming doesn't support kafka transaction (creating empty offset 
> with abort & markers)
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25005
>                 URL: https://issues.apache.org/jira/browse/SPARK-25005
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Quentin Ambard
>            Assignee: Shixiong Zhu
>            Priority: Major
>             Fix For: 2.4.0
>
>
> Structured streaming can't consume kafka transaction. 
> We could try to apply SPARK-24720 (DStream) logic to Structured Streaming 
> source



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to