Elvinas Pilypas created SPARK-53918:
---------------------------------------

             Summary: asyncCheckpoint.enabled incompatible with 
transformWithStateInPandas (StatefulProcessor)
                 Key: SPARK-53918
                 URL: https://issues.apache.org/jira/browse/SPARK-53918
             Project: Spark
          Issue Type: Bug
          Components: PySpark, Structured Streaming
    Affects Versions: 4.0.1, 4.0.0
         Environment: Databricks standard workspace

Databricks runtime: 17.0 (Scala 2.13, Spark 4.0.0)

Node type: Standard_D4ads_v5
            Reporter: Elvinas Pilypas


In structured streaming job, we use transformWithStateInPandas stateful 
processing and we have 
com.databricks.sql.streaming.state.RocksDBStateStoreProvider enabled. 
Everything works as expected.

But when we enable:
spark.conf.set("spark.databricks.streaming.statefulOperator.asyncCheckpoint.enabled",
 "true")

We get an error: org.apache.spark.SparkUnsupportedOperationException: 
Synchronous commit is not supported. Use asynchronous commit.
 
We have successfully replicated that in notebook, and also tried the same thing 
with applyInPandasWithState. * transformWithStateInPandas - throws Synchronous 
commit error
 * applyInPandasWithState - works as expected

We followed this guide: 
[https://learn.microsoft.com/en-us/azure/databricks/structured-streaming/async-checkpointing]
 and also used chatGPT for references.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to