[ 
https://issues.apache.org/jira/browse/SPARK-44464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated SPARK-44464:
--------------------------------
    Description: The current implementation of 
{{ApplyInPandasWithStatePythonRunner}} cannot deal with outputs where the first 
column of the row is {{{}null{}}}, as it cannot distinguish the case where the 
column is null, or the field is filled as the number of data records are 
smaller than state records. It causes incorrect results for the former case.  
(was: The current implementation of {{ApplyInPandasWithStatePythonRunner}} 
cannot deal with outputs where the first column of the row is {{{}null{}}}, as 
it cannot distinguish the case where the column is null, or the field is filled 
as the number of data records are smaller than state records.)

> Fix applyInPandasWithStatePythonRunner to output rows that have Null as first 
> column value
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-44464
>                 URL: https://issues.apache.org/jira/browse/SPARK-44464
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.3.3
>            Reporter: Siying Dong
>            Priority: Major
>
> The current implementation of {{ApplyInPandasWithStatePythonRunner}} cannot 
> deal with outputs where the first column of the row is {{{}null{}}}, as it 
> cannot distinguish the case where the column is null, or the field is filled 
> as the number of data records are smaller than state records. It causes 
> incorrect results for the former case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to