[
https://issues.apache.org/jira/browse/SPARK-52576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-52576:
----------------------------------
Parent: SPARK-51727
Issue Type: Sub-task (was: Improvement)
> In Declarative Pipelines, drop/recreate on full refresh and MV update
> ---------------------------------------------------------------------
>
> Key: SPARK-52576
> URL: https://issues.apache.org/jira/browse/SPARK-52576
> Project: Spark
> Issue Type: Sub-task
> Components: Declarative Pipelines
> Affects Versions: 4.1.0
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Some pipeline runs result in wiping out and replacing all the data for a
> table:
> * Every run of a materialized view
> * Runs of streaming tables that have the "full refresh" flag
> In the current implementation, this "wipe out and replace" is implemented by:
> * Truncating the table
> * Altering the table to drop/update/add columns that don't match the columns
> in the DataFrame for the current run
> The reason that we want originally wanted to truncate + alter instead of drop
> / recreate is that dropping has some undesirable effects. E.g. it interrupts
> readers of the table and wipes away things like ACLs.
> However, we discovered that not all catalogs support dropping columns (e.g.
> Hive does not), and there’s no way to tell whether a catalog supports
> dropping columns or not. So change the implementation to drop/recreate the
> table instead of truncate/alter.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]