[jira] [Work logged] (HIVE-26904) QueryCompactor failed in commitCompaction if the tmp table dir is already removed

ASF GitHub Bot (Jira) Tue, 17 Jan 2023 19:11:13 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-26904?focusedWorklogId=839811&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-839811
 ]


ASF GitHub Bot logged work on HIVE-26904:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Jan/23 03:10
            Start Date: 18/Jan/23 03:10
    Worklog Time Spent: 10m 
      Work Description: stiga-huang commented on code in PR #3910:
URL: https://github.com/apache/hive/pull/3910#discussion_r1073054442


##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java:
##########
@@ -245,16 +243,32 @@ private static void disableLlapCaching(HiveConf conf) {
      * @throws IOException the directory cannot be deleted
      * @throws HiveException the table is not found
      */
-    static void cleanupEmptyDir(HiveConf conf, String tmpTableName) throws 
IOException, HiveException {
+    static void cleanupEmptyTableDir(HiveConf conf, String tmpTableName)
+        throws IOException, HiveException {
       org.apache.hadoop.hive.ql.metadata.Table tmpTable = 
Hive.get().getTable(tmpTableName);
       if (tmpTable != null) {
-        Path path = new Path(tmpTable.getSd().getLocation());
-        FileSystem fs = path.getFileSystem(conf);
+        cleanupEmptyDir(conf, new Path(tmpTable.getSd().getLocation()));
+      }
+    }
+
+    /**
+     * Remove the directory if it's empty.
+     * @param conf the Hive configuration
+     * @param path path of the directory
+     * @throws IOException if any IO error occurs
+     */
+    static void cleanupEmptyDir(HiveConf conf, Path path) throws IOException {
+      FileSystem fs = path.getFileSystem(conf);
+      try {
         if (!fs.listFiles(path, false).hasNext()) {
           fs.delete(path, true);
         }
+      } catch (FileNotFoundException e) {
+        // Ignore the case when the dir was already removed
+        LOG.warn("Ignored exception during cleanup {}", path, e);

Review Comment:
   FWIW, the following log shows the stacktrace of where the 
`FileNotFoundException` is thrown:
   ```
   2023-01-02T02:12:55,849 ERROR 
[impala-ec2-centos79-m6i-4xlarge-ondemand-1428.vpc.cloudera.com-48_executor] 
compactor.Worker: Caught exception while trying to compact 
id:15,dbname:partial_catalog_info_test,tableName:insert_only_partitioned,partName:part=1,state:^@,type:MINOR,enqueueTime:0,start:0,properties:null,runAs:jenkins,tooManyAborts:false,hasOldAbort:false,highestWriteId:3,errorMessage:null,workerId:
 null,initiatorId: null,retryRetention0. Marking failed to avoid repeated 
failures
   java.io.FileNotFoundException: File 
hdfs://localhost:20500/tmp/hive/jenkins/092b533a-81c8-4b95-88e4-9472cf6f365d/_tmp_space.db/62ec04fb-e2d2-4a99-a454-ae709a3cccfe
 does not exist.
           at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275)
 ~[hadoop-hdfs-client-3.1.1.7.2.15.4-6.jar:?]
           at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249)
 ~[hadoop-hdfs-client-3.1.1.7.2.15.4-6.jar:?]
           at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194)
 ~[hadoop-hdfs-client-3.1.1.7.2.15.4-6.jar:?]
           at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190)
 ~[hadoop-hdfs-client-3.1.1.7.2.15.4-6.jar:?]
           at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.1.1.7.2.15.4-6.jar:?]
           at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208)
 ~[hadoop-hdfs-client-3.1.1.7.2.15.4-6.jar:?]
           at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) 
~[hadoop-common-3.1.1.7.2.15.4-6.jar:?]
           at org.apache.hadoop.fs.FileSystem$5.<init>(FileSystem.java:2302) 
~[hadoop-common-3.1.1.7.2.15.4-6.jar:?]
           at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2299) 
~[hadoop-common-3.1.1.7.2.15.4-6.jar:?]
           at 
org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor$Util.cleanupEmptyDir(QueryCompactor.java:261)
 ~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
           at 
org.apache.hadoop.hive.ql.txn.compactor.MmMinorQueryCompactor.commitCompaction(MmMinorQueryCompactor.java:72)
 ~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
           at 
org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor.runCompactionQueries(QueryCompactor.java:146)
 ~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
           at 
org.apache.hadoop.hive.ql.txn.compactor.MmMinorQueryCompactor.runCompaction(MmMinorQueryCompactor.java:63)
 ~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
           at 
org.apache.hadoop.hive.ql.txn.compactor.Worker.findNextCompactionAndExecute(Worker.java:435)
 ~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
           at 
org.apache.hadoop.hive.ql.txn.compactor.Worker.lambda$run$0(Worker.java:115) 
~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
           at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_261]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_261]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_261]
           at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
   ```





Issue Time Tracking
-------------------

    Worklog Id:     (was: 839811)
    Time Spent: 50m  (was: 40m)

> QueryCompactor failed in commitCompaction if the tmp table dir is already 
> removed 
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-26904
>                 URL: https://issues.apache.org/jira/browse/HIVE-26904
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> commitCompaction() of query-based compactions just remove the dirs of tmp 
> tables. It should not fail the compaction if the dirs are already removed.
> We've seen such a failure in Impala's test (IMPALA-11756):
> {noformat}
> 2023-01-02T02:09:26,306  INFO [HiveServer2-Background-Pool: Thread-695] 
> ql.Driver: Executing 
> command(queryId=jenkins_20230102020926_69112755-b783-4214-89e5-1c7111dfe15f): 
> alter table partial_catalog_info_test.insert_only_partitioned partition 
> (part=1) compact 'minor' and wait
> 2023-01-02T02:09:26,306  INFO [HiveServer2-Background-Pool: Thread-695] 
> ql.Driver: Starting task [Stage-0:DDL] in serial mode
> 2023-01-02T02:09:26,317  INFO [HiveServer2-Background-Pool: Thread-695] 
> exec.Task: Compaction enqueued with id 15
> ...
> 2023-01-02T02:12:55,849 ERROR 
> [impala-ec2-centos79-m6i-4xlarge-ondemand-1428.vpc.cloudera.com-48_executor] 
> compactor.Worker: Caught exception while trying to compact 
> id:15,dbname:partial_catalog_info_test,tableName:insert_only_partitioned,partName:part=1,state:^@,type:MINOR,enqueueTime:0,start:0,properties:null,runAs:jenkins,tooManyAborts:false,hasOldAbort:false,highestWriteId:3,errorMessage:null,workerId:
>  null,initiatorId: null,retryRetention0. Marking failed to avoid repeated 
> failures
> java.io.FileNotFoundException: File 
> hdfs://localhost:20500/tmp/hive/jenkins/092b533a-81c8-4b95-88e4-9472cf6f365d/_tmp_space.db/62ec04fb-e2d2-4a99-a454-ae709a3cccfe
>  does not exist.
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275)
>  ~[hadoop-hdfs-client-3.1.1.7.2.15.4-6.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249)
>  ~[hadoop-hdfs-client-3.1.1.7.2.15.4-6.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194)
>  ~[hadoop-hdfs-client-3.1.1.7.2.15.4-6.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190)
>  ~[hadoop-hdfs-client-3.1.1.7.2.15.4-6.jar:?]
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-3.1.1.7.2.15.4-6.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208)
>  ~[hadoop-hdfs-client-3.1.1.7.2.15.4-6.jar:?]
>         at 
> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) 
> ~[hadoop-common-3.1.1.7.2.15.4-6.jar:?]
>         at org.apache.hadoop.fs.FileSystem$5.<init>(FileSystem.java:2302) 
> ~[hadoop-common-3.1.1.7.2.15.4-6.jar:?]
>         at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2299) 
> ~[hadoop-common-3.1.1.7.2.15.4-6.jar:?]
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor$Util.cleanupEmptyDir(QueryCompactor.java:261)
>  ~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.MmMinorQueryCompactor.commitCompaction(MmMinorQueryCompactor.java:72)
>  ~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.QueryCompactor.runCompactionQueries(QueryCompactor.java:146)
>  ~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.MmMinorQueryCompactor.runCompaction(MmMinorQueryCompactor.java:63)
>  ~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.Worker.findNextCompactionAndExecute(Worker.java:435)
>  ~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.Worker.lambda$run$0(Worker.java:115) 
> ~[hive-exec-3.1.3000.2022.0.13.0-60.jar:3.1.3000.2022.0.13.0-60]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_261]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_261]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_261]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
> 2023-01-02T02:12:55,858  INFO 
> [impala-ec2-centos79-m6i-4xlarge-ondemand-1428.vpc.cloudera.com-48_executor] 
> compactor.Worker: Deleting result directories created by the 
> compactor:2023-01-02T02:12:55,858  INFO 
> [impala-ec2-centos79-m6i-4xlarge-ondemand-1428.vpc.cloudera.com-48_executor] 
> compactor.Worker: 
> hdfs://localhost:20500/test-warehouse/managed/partial_catalog_info_test.db/insert_only_partitioned/part=1/delta_0000001_0000003_v0001827
> 2023-01-02T02:12:55,859  INFO 
> [impala-ec2-centos79-m6i-4xlarge-ondemand-1428.vpc.cloudera.com-48_executor] 
> compactor.Worker: 
> hdfs://localhost:20500/test-warehouse/managed/partial_catalog_info_test.db/insert_only_partitioned/part=1/delete_delta_0000001_0000003_v0001827
> 2023-01-02T02:12:55,859  INFO 
> [impala-ec2-centos79-m6i-4xlarge-ondemand-1428.vpc.cloudera.com-48_executor] 
> compactor.CompactionHeartbeatService: Stopping heartbeat task for TXN 1827
> 2023-01-02T02:12:55,859  INFO 
> [impala-ec2-centos79-m6i-4xlarge-ondemand-1428.vpc.cloudera.com-48_executor] 
> compactor.CompactionHeartbeatService$CompactionHeartbeater: Shutting down 
> compaction txn heartbeater instance.
> 2023-01-02T02:12:55,859  INFO 
> [impala-ec2-centos79-m6i-4xlarge-ondemand-1428.vpc.cloudera.com-48_executor] 
> compactor.CompactionHeartbeatService$CompactionHeartbeater: Compaction txn 
> heartbeater instance is successfully stopped.
> 2023-01-02T02:12:55,872  INFO 
> [impala-ec2-centos79-m6i-4xlarge-ondemand-1428.vpc.cloudera.com-48] 
> compactor.Worker: Worker thread finished one loop.
> 2023-01-02T02:12:55,872  INFO 
> [impala-ec2-centos79-m6i-4xlarge-ondemand-1428.vpc.cloudera.com-48_executor] 
> compactor.Worker: Processing compaction request null
> 2023-01-02T02:12:56,400  INFO [HiveServer2-Background-Pool: Thread-695] 
> exec.Task: Compaction with id 15 finished with status: failed  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26904) QueryCompactor failed in commitCompaction if the tmp table dir is already removed

Reply via email to