Hiroyuki Nagaya created TEZ-4604:
------------------------------------
Summary: Hive compaction in Tez does not delete files under
staging directory
Key: TEZ-4604
URL: https://issues.apache.org/jira/browse/TEZ-4604
Project: Apache Tez
Issue Type: Improvement
Reporter: Hiroyuki Nagaya
I am using a combination of Hadoop, Hive and Tez.
When I run major compaction with Hive, files under the staging directory are
not deleted.
With Mapreduce, files are deleted from the staging directory and files are
created in the history directory.
Hadoop 3.3.6
Hive 4.0.1
Tez 0.10.4
1. When using Mapreduce
The following data will be deleted.
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.jar
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.split
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.splitmetainfo
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1705466455536_3620/job.xml
Historical data will be created in the following directories
/tmp/hadoop-yarn/staging/history/done
2. When using Tez
The following data will not be deleted
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/.tez
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.jar
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.split
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.splitmetainfo
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002/job.xml
No historical data will be created.
Is it a bug that the following directories are not deleted?
Or is it a Tez configuration problem?
I would like it to be deleted because the process has been completed
successfully and it is about 80MB in size.
/tmp/hadoop-yarn/staging/hadoop/.staging/job_1740026697751_0002
--
This message was sent by Atlassian Jira
(v8.20.10#820010)