[ 
https://issues.apache.org/jira/browse/SPARK-25756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658567#comment-16658567
 ] 

Hyukjin Kwon commented on SPARK-25756:
--------------------------------------

1. Can you add a self-contained reproducer please? for instance, with {{rate}} 
source or {{socket}} source.
2. Does this only happen in pandas_udf (not normal python udf)?
3. Do you mind showing expected results and the current results?

> pyspark pandas_udf does not respect append outputMode in structured streaming
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-25756
>                 URL: https://issues.apache.org/jira/browse/SPARK-25756
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Structured Streaming
>    Affects Versions: 2.3.2
>            Reporter: Jan Bols
>            Priority: Major
>
> When using the following setup:
>  * structured streaming
>  * a watermark and groupBy followed by an apply using a pandas grouped map udf
>  * a sink using an append outputMode
> I would expect the following:
>  * udf to be called for each group --> OK
>  * when new data arrives, the udf will be called again –> OK
>  * when new data arrives for the same group, the udf will be called with the 
> complete pandas dataframe of all received data for that group (up till the 
> watermark) --> NOK: within the same group, the size of the pandas dataframe 
> can decrease between invocations
>  * the results are only written to the sink once the processing time is 
> passed the watermark --> NOK: every time the udf is called, new results are 
> being sent to the output
> It looks like pandas udf is unusable for structured streaming this way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to