Anish Mahto created SPARK-57670:
-----------------------------------
Summary: Spark Pipelines table properties are not dropped on
subsequent run
Key: SPARK-57670
URL: https://issues.apache.org/jira/browse/SPARK-57670
Project: Spark
Issue Type: Bug
Components: Declarative Pipelines
Affects Versions: 4.1.2, 4.1.1, 4.1.0
Reporter: Anish Mahto
Consider a using creating a pipeline with one table, with the following
properties:
```
from pyspark import pipelines as dp
@dp.materialized_view(
table_properties={
"myproperty":"value",
"myotherproperty":"othervalue",
},
)
def my_table():
return spark.range(10)
```
Before the next pipeline run, they might change their pipeline definition as
such:
```
from pyspark import pipelines as dp
@dp.materialized_view(
table_properties={
"myproperty":"value",
},
)
def my_table():
return spark.range(10)
```
The expectation would be on the second pipeline run, `myotherproperty` is
dropped from the table in the configured catalog during table materialization.
In reality however, pipelines table property evolution is additive only today;
we do not detect and produce a remove property table change for dropped
properties across runs. Code pointer:
https://github.com/apache/spark/blob/master/sql/pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/DatasetManager.scala#L331
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]