[ 
https://issues.apache.org/jira/browse/SPARK-25005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637541#comment-16637541
 ] 

Shixiong Zhu commented on SPARK-25005:
--------------------------------------

[~qambard] If `poll` returns and offset gets changed, it means Kafka consumer 
fetches something but all of messages are invisible so consumer return empty.

If `poll` returns but offset doesn't change, it means Kafka fetches nothing 
before timeout. In this case, we just throw "TimeoutException". Spark will 
retry the task or just fail the job. Large GC pause can cause timeout and the 
user should tune the configs to avoid this happening. We cannot do much in 
Spark.

> Structured streaming doesn't support kafka transaction (creating empty offset 
> with abort & markers)
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25005
>                 URL: https://issues.apache.org/jira/browse/SPARK-25005
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Quentin Ambard
>            Assignee: Shixiong Zhu
>            Priority: Major
>             Fix For: 2.4.0
>
>
> Structured streaming can't consume kafka transaction. 
> We could try to apply SPARK-24720 (DStream) logic to Structured Streaming 
> source



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to