[ 
https://issues.apache.org/jira/browse/SPARK-53918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-53918.
----------------------------------
    Resolution: Won't Do

> asyncCheckpoint.enabled incompatible with transformWithStateInPandas 
> (StatefulProcessor)
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-53918
>                 URL: https://issues.apache.org/jira/browse/SPARK-53918
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Structured Streaming
>    Affects Versions: 4.0.0, 4.0.1
>         Environment: Databricks standard workspace
> Databricks runtime: 17.0 (Scala 2.13, Spark 4.0.0)
> Node type: Standard_D4ads_v5
>            Reporter: Elvinas Pilypas
>            Priority: Minor
>         Attachments: Async checkpoint test notebook.ipynb
>
>
> In structured streaming job, we use transformWithStateInPandas stateful 
> processing and we have 
> com.databricks.sql.streaming.state.RocksDBStateStoreProvider enabled. 
> Everything works as expected.
> But when we enable:
> spark.conf.set("spark.databricks.streaming.statefulOperator.asyncCheckpoint.enabled",
>  "true")
> We get an error: org.apache.spark.SparkUnsupportedOperationException: 
> Synchronous commit is not supported. Use asynchronous commit.
>  
> We have successfully replicated that in notebook, and also tried the same 
> thing with applyInPandasWithState.
>  * transformWithStateInPandas - throws Synchronous commit error
>  * applyInPandasWithState - works as expected
> We followed this guide: 
> [https://learn.microsoft.com/en-us/azure/databricks/structured-streaming/async-checkpointing]
>  and also used chatGPT for references.
> Looking through the stacktrace, It seems like the issue is where 
> {{TransformWithStateInPandasExec}} uses the sync {{commit()}} instead of the 
> async {{{}commitAsync(){}}}.
>  
> Please check the attached notebook below



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to