[ 
https://issues.apache.org/jira/browse/HIVE-23791?focusedWorklogId=453516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453516
 ]

ASF GitHub Bot logged work on HIVE-23791:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Jul/20 16:36
            Start Date: 01/Jul/20 16:36
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #1196:
URL: https://github.com/apache/hive/pull/1196#discussion_r448482057



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -2614,28 +2633,25 @@ public static Path getVersionFilePath(Path deltaOrBase) 
{
           + " from " + jc.get(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY));
       return null;
     }
-    Directory acidInfo = AcidUtils.getAcidState(fs, dir, jc, idList, null, 
false);
+    if (fs == null) {
+      fs = dir.getFileSystem(jc);
+    }
+    // Collect the all of the files/dirs
+    Map<Path, HdfsDirSnapshot> hdfsDirSnapshots = 
AcidUtils.getHdfsDirSnapshots(fs, dir);

Review comment:
       this might be out-of-scope for this change: but this *static* method in 
`AcidUtils` is trying to do all the work upfront...
   which might lead to:
   * that it does work which is not even needed
   * it doesn't scan some location - and the map just returns null ; so it 
might be not noticable
   
   I think it would be better if this method would return a something (it could 
still be a map) which could fill in stuff from hdfs if its not cached already...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 453516)
    Time Spent: 0.5h  (was: 20m)

> Optimize ACID stats generation
> ------------------------------
>
>                 Key: HIVE-23791
>                 URL: https://issues.apache.org/jira/browse/HIVE-23791
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics, Transactions
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently basic stats generation uses file listing for getting statistics, 
> and also uses a file listing for getting the acid state. We should optimize 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to