Anil Dasari created SPARK-44224:
-----------------------------------
Summary: Table drop with Purge statement is not deleting the
_temporary folder
Key: SPARK-44224
URL: https://issues.apache.org/jira/browse/SPARK-44224
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.3.1
Reporter: Anil Dasari
When saveAsTable is failed for some reason and `_temporary` folder created is
not deleted with Drop table with purge statement.
On high level , out data process look like below.
# Read data from external store using load
# Drop the table with purge just in case table exist already
# Write Dataframe to hive using
dataframe.write.saveMode(Overwrite).saveAsTable("my_table")
When Step 3 is failed for some reason and job is restarted because of yarm
maxAttempts, Step is not deleting the _temporary folder created by Step 3 that
is causing job to fail with below exception
org.apache.spark.sql.AnalysisException: Can not create the managed
table("my_table"). The associated location('<hdfs path>') already exists.
This is never been a case in Spark2 because of
"spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation" config which
is removed in Spark 3.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]