ASF GitHub Bot logged work on HIVE-23791:

                Author: ASF GitHub Bot
            Created on: 01/Jul/20 16:36
            Start Date: 01/Jul/20 16:36
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #1196:
URL: https://github.com/apache/hive/pull/1196#discussion_r448482057

File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
@@ -2614,28 +2633,25 @@ public static Path getVersionFilePath(Path deltaOrBase) 
           + " from " + jc.get(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY));
       return null;
-    Directory acidInfo = AcidUtils.getAcidState(fs, dir, jc, idList, null, 
+    if (fs == null) {
+      fs = dir.getFileSystem(jc);
+    }
+    // Collect the all of the files/dirs
+    Map<Path, HdfsDirSnapshot> hdfsDirSnapshots = 
AcidUtils.getHdfsDirSnapshots(fs, dir);

Review comment:
       this might be out-of-scope for this change: but this *static* method in 
`AcidUtils` is trying to do all the work upfront...
   which might lead to:
   * that it does work which is not even needed
   * it doesn't scan some location - and the map just returns null ; so it 
might be not noticable
   I think it would be better if this method would return a something (it could 
still be a map) which could fill in stuff from hdfs if its not cached already...

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

Issue Time Tracking

    Worklog Id:     (was: 453516)
    Time Spent: 0.5h  (was: 20m)

> Optimize ACID stats generation
> ------------------------------
>                 Key: HIVE-23791
>                 URL: https://issues.apache.org/jira/browse/HIVE-23791
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics, Transactions
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
> Currently basic stats generation uses file listing for getting statistics, 
> and also uses a file listing for getting the acid state. We should optimize 
> this.

This message was sent by Atlassian Jira

Reply via email to