Tigran Manasyan created SPARK-54129:
---------------------------------------

             Summary: Extract dropping table from catalog from DataFrameWriter 
in case of overwriting table during creation
                 Key: SPARK-54129
                 URL: https://issues.apache.org/jira/browse/SPARK-54129
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core, SQL
    Affects Versions: 4.0.1, 3.5.6
            Reporter: Tigran Manasyan


Currently, if we call DataFrameWriter.saveAsTable(tableName) with the SaveMode 
equal to Overwrite and the DataSource is V1, the table is dropped from catalog 
directly in the DataFrameWriter and only then the CreateTable command is 
created. 

In case if we want to somehow change the default behavior using Spark 
extensions, for instance, postpone the table dropping to decrease the time 
interval, when the old table is dropped and the new one is not created, such 
hardcoded behavior in DataFrameWriter makes it almost impossible if we want to 
use the original saveAsTable method. 

For instance, if we call df.write.mode("overwrite").insertInto(table_name) then 
it's possible to control the table removal logic if we provide our own 
implementation of the spark.sql.sources.commitProtocolClass. In case of 
saveAsTable it makes no sense, because a table is unconditionally dropped 
before handling the CreateTable command.

Therefore, is it possible to extract the table dropping logic from 
DataFrameWriter to the corresponding LogicalPlan commands (as it was done for 
V2 Datasources) in order to give more flexibility to Spark extensions?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to