Elvinas Pilypas created SPARK-53918:
---------------------------------------
Summary: asyncCheckpoint.enabled incompatible with
transformWithStateInPandas (StatefulProcessor)
Key: SPARK-53918
URL: https://issues.apache.org/jira/browse/SPARK-53918
Project: Spark
Issue Type: Bug
Components: PySpark, Structured Streaming
Affects Versions: 4.0.1, 4.0.0
Environment: Databricks standard workspace
Databricks runtime: 17.0 (Scala 2.13, Spark 4.0.0)
Node type: Standard_D4ads_v5
Reporter: Elvinas Pilypas
In structured streaming job, we use transformWithStateInPandas stateful
processing and we have
com.databricks.sql.streaming.state.RocksDBStateStoreProvider enabled.
Everything works as expected.
But when we enable:
spark.conf.set("spark.databricks.streaming.statefulOperator.asyncCheckpoint.enabled",
"true")
We get an error: org.apache.spark.SparkUnsupportedOperationException:
Synchronous commit is not supported. Use asynchronous commit.
We have successfully replicated that in notebook, and also tried the same thing
with applyInPandasWithState. * transformWithStateInPandas - throws Synchronous
commit error
* applyInPandasWithState - works as expected
We followed this guide:
[https://learn.microsoft.com/en-us/azure/databricks/structured-streaming/async-checkpointing]
and also used chatGPT for references.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]