Tigran Manasyan created SPARK-54129:
---------------------------------------
Summary: Extract dropping table from catalog from DataFrameWriter
in case of overwriting table during creation
Key: SPARK-54129
URL: https://issues.apache.org/jira/browse/SPARK-54129
Project: Spark
Issue Type: Improvement
Components: Spark Core, SQL
Affects Versions: 4.0.1, 3.5.6
Reporter: Tigran Manasyan
Currently, if we call DataFrameWriter.saveAsTable(tableName) with the SaveMode
equal to Overwrite and the DataSource is V1, the table is dropped from catalog
directly in the DataFrameWriter and only then the CreateTable command is
created.
In case if we want to somehow change the default behavior using Spark
extensions, for instance, postpone the table dropping to decrease the time
interval, when the old table is dropped and the new one is not created, such
hardcoded behavior in DataFrameWriter makes it almost impossible if we want to
use the original saveAsTable method.
For instance, if we call df.write.mode("overwrite").insertInto(table_name) then
it's possible to control the table removal logic if we provide our own
implementation of the spark.sql.sources.commitProtocolClass. In case of
saveAsTable it makes no sense, because a table is unconditionally dropped
before handling the CreateTable command.
Therefore, is it possible to extract the table dropping logic from
DataFrameWriter to the corresponding LogicalPlan commands (as it was done for
V2 Datasources) in order to give more flexibility to Spark extensions?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]