[
https://issues.apache.org/jira/browse/SPARK-53918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim resolved SPARK-53918.
----------------------------------
Resolution: Won't Do
> asyncCheckpoint.enabled incompatible with transformWithStateInPandas
> (StatefulProcessor)
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-53918
> URL: https://issues.apache.org/jira/browse/SPARK-53918
> Project: Spark
> Issue Type: Bug
> Components: PySpark, Structured Streaming
> Affects Versions: 4.0.0, 4.0.1
> Environment: Databricks standard workspace
> Databricks runtime: 17.0 (Scala 2.13, Spark 4.0.0)
> Node type: Standard_D4ads_v5
> Reporter: Elvinas Pilypas
> Priority: Minor
> Attachments: Async checkpoint test notebook.ipynb
>
>
> In structured streaming job, we use transformWithStateInPandas stateful
> processing and we have
> com.databricks.sql.streaming.state.RocksDBStateStoreProvider enabled.
> Everything works as expected.
> But when we enable:
> spark.conf.set("spark.databricks.streaming.statefulOperator.asyncCheckpoint.enabled",
> "true")
> We get an error: org.apache.spark.SparkUnsupportedOperationException:
> Synchronous commit is not supported. Use asynchronous commit.
>
> We have successfully replicated that in notebook, and also tried the same
> thing with applyInPandasWithState.
> * transformWithStateInPandas - throws Synchronous commit error
> * applyInPandasWithState - works as expected
> We followed this guide:
> [https://learn.microsoft.com/en-us/azure/databricks/structured-streaming/async-checkpointing]
> and also used chatGPT for references.
> Looking through the stacktrace, It seems like the issue is where
> {{TransformWithStateInPandasExec}} uses the sync {{commit()}} instead of the
> async {{{}commitAsync(){}}}.
>
> Please check the attached notebook below
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]