[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

ASF GitHub Bot (Jira) Tue, 20 Jul 2021 03:10:29 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25276?focusedWorklogId=625124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-625124
 ]


ASF GitHub Bot logged work on HIVE-25276:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Jul/21 10:09
            Start Date: 20/Jul/21 10:09
    Worklog Time Spent: 10m 
      Work Description: pvary commented on a change in pull request #2419:
URL: https://github.com/apache/hive/pull/2419#discussion_r672435549



##########
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
       preAlterTableProperties.tableLocation = sd.getLocation();
       preAlterTableProperties.format = sd.getInputFormat();
       preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-      preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
       preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
       context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
       // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-      if (hmsTable.isSetPartitionKeys()) {
+      if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+        List<PartitionTransformSpec> spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+        if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {
+          throw new MetaException("Query state attached to Session state must 
be not null. " +
+              "Partition transform metadata cannot be saved.");
+        }
         hmsTable.getSd().getCols().addAll(hmsTable.getPartitionKeys());
         hmsTable.setPartitionKeysIsSet(false);
       }
+      preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, hmsTable);

Review comment:
       This is moved from line 236. We need it to be set, but we have to do it 
after we got the correct spec

##########
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##########
@@ -233,15 +237,21 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
       preAlterTableProperties.tableLocation = sd.getLocation();
       preAlterTableProperties.format = sd.getInputFormat();
       preAlterTableProperties.schema = schema(catalogProperties, hmsTable);
-      preAlterTableProperties.spec = spec(conf, 
preAlterTableProperties.schema, catalogProperties, hmsTable);
       preAlterTableProperties.partitionKeys = hmsTable.getPartitionKeys();
 
       context.getProperties().put(HiveMetaHook.ALLOW_PARTITION_KEY_CHANGE, 
"true");
       // If there are partition keys specified remove them from the HMS table 
and add them to the column list
-      if (hmsTable.isSetPartitionKeys()) {
+      if (hmsTable.isSetPartitionKeys() && 
!hmsTable.getPartitionKeys().isEmpty()) {
+        List<PartitionTransformSpec> spec = 
PartitionTransform.getPartitionTransformSpec(hmsTable.getPartitionKeys());
+        if (!SessionStateUtil.addResource(conf, 
hive_metastoreConstants.PARTITION_TRANSFORM_SPEC, spec)) {

Review comment:
       This is for migrating tables from non-Iceberg tables to Iceberg tables. 
Previously we just depended on the partition cols, from now on we need to have 
the data in the `SessionState` instead. So we put that there

##########
File path: 
iceberg/iceberg-handler/src/test/results/positive/vectorized_iceberg_read.q.out
##########
@@ -129,17 +129,17 @@ Stage-0
     Stage-1
       Reducer 2 vectorized
       File Output Operator [FS_11]
-        Select Operator [SEL_10] (rows=1 width=564)
+        Select Operator [SEL_10] (rows=1 width=372)

Review comment:
       TBH I am not sure, but I expect that has something to do with the new 
statistics

##########
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezOutputCommitter.java
##########
@@ -122,6 +122,7 @@ private IDriver getDriverWithCommitter(String 
committerClass) {
     conf.setVar(HiveConf.ConfVars.HIVE_AUTHORIZATION_MANAGER,
         
"org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory");
     conf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, false);
+    conf.setBoolVar(HiveConf.ConfVars.HIVESTATSCOLAUTOGATHER, false);

Review comment:
       Otherwise the tests are failing, because with stats turned on we 
generate 2 tasks instead of 1 (change of the execution plans which contain a 
stage)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 625124)
    Time Spent: 4h 20m  (was: 4h 10m)

> Enable automatic statistics generation for Iceberg tables
> ---------------------------------------------------------
>
>                 Key: HIVE-25276
>                 URL: https://issues.apache.org/jira/browse/HIVE-25276
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> During inserts we should have calculate the column statistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25276) Enable automatic statistics generation for Iceberg tables

Reply via email to