[ 
https://issues.apache.org/jira/browse/SPARK-39554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangkunJiang updated SPARK-39554:
----------------------------------
    External issue ID: https://issues.apache.org/jira/browse/SPARK-22642  (was: 
spark-22642)

> insertIntoHive ExternalTmpPath won't be clear when the app being killed
> -----------------------------------------------------------------------
>
>                 Key: SPARK-39554
>                 URL: https://issues.apache.org/jira/browse/SPARK-39554
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.1, 3.2.1
>         Environment: ubuntu16.04
> hadoop3.1.1
> hive 3.1.2
>            Reporter: GuangkunJiang
>            Priority: Critical
>         Attachments: feat_tmpfix.patch
>
>
> When there is a problem with some types of SparkSql execution (eg: 
> InsertIntoHiveDirCommand and InsertIntoTableDirCommand)
> When exiting abnormally, such as being killed by yarn, the .hive-staging 
> directory being written will remain and will not be deleted.
> Check the source code to find the specific location here:
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable#run
> ```scala
>     val tmpLocation = getExternalTmpPath(sparkSession, hadoopConf, 
> tableLocation)
>     try {
>       processInsert(sparkSession, externalCatalog, hadoopConf, tableDesc, 
> tmpLocation, child)
>     } finally {
>       // Attempt to delete the staging directory and the inclusive files. If 
> failed, the files are
>       // expected to be dropped at the normal termination of VM since 
> deleteOnExit is used.
>       deleteExternalTmpPath(hadoopConf)
>     }
> ```
> From spark driver log, I got spark only do shuthook when the application 
> being killed;
> I have two questions:
> 1. The deleteExternalTmpPath method in finally has no effect when the process 
> is killed
> 2. fs.deleteOnExit(dir) According to the annotation, the data will be cleaned 
> up when the jvm is destroyed
> TmpFixed like this:



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to