Anil Dasari created SPARK-44224:
-----------------------------------

             Summary: Table drop with Purge statement is not deleting the 
_temporary folder
                 Key: SPARK-44224
                 URL: https://issues.apache.org/jira/browse/SPARK-44224
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.1
            Reporter: Anil Dasari


When saveAsTable is failed for some reason and `_temporary` folder created is 
not deleted with Drop table with purge statement. 

On high level , out data process look like below. 
 # Read data from external store using load 
 # Drop the table with purge just in case table exist already
 # Write Dataframe to hive using 
dataframe.write.saveMode(Overwrite).saveAsTable("my_table")

When Step 3 is failed for some reason and job is restarted because of yarm 
maxAttempts, Step is not deleting the _temporary folder created by Step 3 that 
is causing job to fail with below exception

org.apache.spark.sql.AnalysisException: Can not create the managed 
table("my_table"). The associated location('<hdfs path>') already exists.

This is never been a case in Spark2 because of 
"spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation" config which 
is removed in Spark 3.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to