[ https://issues.apache.org/jira/browse/SPARK-39554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
GuangkunJiang updated SPARK-39554: ---------------------------------- External issue ID: https://issues.apache.org/jira/browse/SPARK-22642 (was: spark-22642) > insertIntoHive ExternalTmpPath won't be clear when the app being killed > ----------------------------------------------------------------------- > > Key: SPARK-39554 > URL: https://issues.apache.org/jira/browse/SPARK-39554 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.1, 3.2.1 > Environment: ubuntu16.04 > hadoop3.1.1 > hive 3.1.2 > Reporter: GuangkunJiang > Priority: Critical > Attachments: feat_tmpfix.patch > > > When there is a problem with some types of SparkSql execution (eg: > InsertIntoHiveDirCommand and InsertIntoTableDirCommand) > When exiting abnormally, such as being killed by yarn, the .hive-staging > directory being written will remain and will not be deleted. > Check the source code to find the specific location here: > org.apache.spark.sql.hive.execution.InsertIntoHiveTable#run > ```scala > val tmpLocation = getExternalTmpPath(sparkSession, hadoopConf, > tableLocation) > try { > processInsert(sparkSession, externalCatalog, hadoopConf, tableDesc, > tmpLocation, child) > } finally { > // Attempt to delete the staging directory and the inclusive files. If > failed, the files are > // expected to be dropped at the normal termination of VM since > deleteOnExit is used. > deleteExternalTmpPath(hadoopConf) > } > ``` > From spark driver log, I got spark only do shuthook when the application > being killed; > I have two questions: > 1. The deleteExternalTmpPath method in finally has no effect when the process > is killed > 2. fs.deleteOnExit(dir) According to the annotation, the data will be cleaned > up when the jvm is destroyed > TmpFixed like this: -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org