[ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=581849&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581849
 ]

ASF GitHub Bot logged work on HIVE-24928:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Apr/21 15:26
            Start Date: 13/Apr/21 15:26
    Worklog Time Spent: 10m 
      Work Description: lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r612551277



##########
File path: 
iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -153,6 +156,37 @@ public DecomposedPredicate decomposePredicate(JobConf 
jobConf, Deserializer dese
     return predicate;
   }
 
+  @Override
+  public boolean canProvideBasicStatistics() {
+    return true;
+  }
+
+  @Override
+  public Map<String, String> getBasicStatistics(TableDesc tableDesc) {
+    Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+    Map<String, String> stats = new HashMap<>();
+    if (table.currentSnapshot() != null) {
+      Map<String, String> summary = table.currentSnapshot().summary();
+      if (summary != null) {
+        if (summary.containsKey(SnapshotSummary.TOTAL_DATA_FILES_PROP)) {
+          stats.put(StatsSetupConst.NUM_FILES, 
summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
+        }
+        if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) {
+          stats.put(StatsSetupConst.ROW_COUNT, 
summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+        }
+        // TODO: add TOTAL_SIZE when iceberg 0.12 is released
+        if (summary.containsKey("total-files-size")) {
+          stats.put(StatsSetupConst.TOTAL_SIZE, 
summary.get("total-files-size"));
+        }
+      }
+    } else {
+      stats.put(StatsSetupConst.NUM_FILES, "0");

Review comment:
       In the case of an empty table, the current snapshot is null. I thought 
setting all the basic stats to 0 is the right approach since we don't have any 
data. 
   When the summary of the snapshot is not available I return an empty 
statistics map. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 581849)
    Time Spent: 6h  (was: 5h 50m)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -------------------------------------------------------------------------
>
>                 Key: HIVE-24928
>                 URL: https://issues.apache.org/jira/browse/HIVE-24928
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 4.0.0
>            Reporter: László Pintér
>            Assignee: László Pintér
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 6h
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to