Anish Mahto created SPARK-57670:
-----------------------------------

             Summary: Spark Pipelines table properties are not dropped on 
subsequent run
                 Key: SPARK-57670
                 URL: https://issues.apache.org/jira/browse/SPARK-57670
             Project: Spark
          Issue Type: Bug
          Components: Declarative Pipelines
    Affects Versions: 4.1.2, 4.1.1, 4.1.0
            Reporter: Anish Mahto


Consider a using creating a pipeline with one table, with the following 
properties:
```
from pyspark import pipelines as dp
 
@dp.materialized_view(
    table_properties={
         "myproperty":"value",
         "myotherproperty":"othervalue",
     },
)
def my_table():
     return spark.range(10)
```

Before the next pipeline run, they might change their pipeline definition as 
such:
```
from pyspark import pipelines as dp
 
@dp.materialized_view(
    table_properties={
        "myproperty":"value",
    },
)
def my_table():
    return spark.range(10)
```
 
The expectation would be on the second pipeline run, `myotherproperty` is 
dropped from the table in the configured catalog during table materialization.
 
In reality however, pipelines table property evolution is additive only today; 
we do not detect and produce a remove property table change for dropped 
properties across runs. Code pointer: 
https://github.com/apache/spark/blob/master/sql/pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/DatasetManager.scala#L331



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to