[GitHub] [hive] amansinha100 commented on a diff in pull request #4079: HIVE-27101: Support incremental materialized view rebuild when Iceberg source tables have insert operation only.

via GitHub Thu, 09 Mar 2023 09:20:12 -0800


amansinha100 commented on code in PR #4079:
URL: https://github.com/apache/hive/pull/4079#discussion_r1131321197



##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##########
@@ -2101,6 +2102,55 @@ public boolean 
validateMaterializedViewsFromRegistry(List<Table> cachedMateriali
     }
   }
 
+  private Materialization 
getMaterializationInvalidationInfo(MaterializedViewMetadata metadata)
+      throws TException, HiveException {
+    Optional<SourceTable> first = 
metadata.getSourceTables().stream().findFirst();
+    if (!first.isPresent()) {
+      // This is unexpected: all MV must have at least one source
+      Materialization materialization = new Materialization();
+      materialization.setSourceTablesCompacted(true);
+      materialization.setSourceTablesUpdateDeleteModified(true);
+      return new Materialization();
+    } else {
+      Table table = getTable(first.get().getTable().getDbName(), 
first.get().getTable().getTableName());
+      if (!(table.isNonNative() && 
table.getStorageHandler().areSnapshotsSupported())) {
+        // Mixing native and non-native acid source tables are not supported. 
If the first source is native acid
+        // the rest is expected to be native acid
+        return getMSC().getMaterializationInvalidationInfo(
+                metadata.creationMetadata, 
conf.get(ValidTxnList.VALID_TXNS_KEY));
+      }
+    }
+
+    MaterializationSnapshot mvSnapshot = 
MaterializationSnapshot.fromJson(metadata.creationMetadata.getValidTxnList());
+
+    boolean hasDelete = false;
+    for (SourceTable sourceTable : metadata.getSourceTables()) {
+      Table table = getTable(sourceTable.getTable().getDbName(), 
sourceTable.getTable().getTableName());
+      HiveStorageHandler storageHandler = table.getStorageHandler();
+      if (storageHandler == null) {
+        Materialization materialization = new Materialization();
+        materialization.setSourceTablesCompacted(true);
+        return materialization;
+      }
+      Boolean b = storageHandler.hasDeletes(

Review Comment:
   Does hasDeletes() api cover all types of CRUD operations that are not insert 
operations ? e.g updates,  truncate table, drop partition etc. 



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HivePushdownSnapshotFilterRule.java:
##########
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptTable;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexShuttle;
+import org.apache.calcite.rex.RexTableInputRef;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.hadoop.hive.ql.metadata.VirtualColumn;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter;
+
+import java.util.Set;
+
+/**
+ * Calcite rule to push down predicates contains {@link 
VirtualColumn#SNAPSHOT_ID} reference to TableScan.
+ * <p>
+ * This rule traverse the logical expression in {@link HiveFilter} operators 
and search for
+ * predicates like
+ * <p>
+ * <code>
+ *   snapshotId &lt;= 12345677899
+ * </code>
+ * <p>
+ * The literal is set in the {@link RelOptHiveTable#getHiveTableMD()} object 
wrapped by
+ * {@link 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan}
+ * and the original predicate in the {@link HiveFilter} is replaced with 
literal true.

Review Comment:
   nit: At another place I saw a mention that the HiveFilter would be dropped.  
If that is done by a different rule, you may want to mention that here for 
clarification. 



##########
iceberg/iceberg-handler/src/test/queries/positive/mv_iceberg_orc4.q:
##########
@@ -0,0 +1,42 @@
+-- MV source tables are iceberg and MV has aggregate
+-- SORT_QUERY_RESULTS
+--! qt:replace:/(.*fromVersion=\[)\S+(\].*)/$1#Masked#$2/
+
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+
+drop table if exists tbl_ice;
+
+create external table tbl_ice(a int, b string, c int) stored by iceberg stored 
as orc tblproperties ('format-version'='1');
+create external table tbl_ice_v2(d int, e string, f int) stored by iceberg 
stored as orc tblproperties ('format-version'='2');
+
+insert into tbl_ice values (1, 'one', 50), (4, 'four', 53), (5, 'five', 54);
+insert into tbl_ice_v2 values (1, 'one v2', 50), (4, 'four v2', 53), (5, 'five 
v2', 54);
+
+create materialized view mat1 as
+select tbl_ice.b, tbl_ice.c, sum(tbl_ice_v2.f)
+from tbl_ice
+join tbl_ice_v2 on tbl_ice.a=tbl_ice_v2.d where tbl_ice.c > 52
+group by tbl_ice.b, tbl_ice.c;
+
+create materialized view mat2  as
+select tbl_ice.b, tbl_ice.c, sum(tbl_ice_v2.f), count(tbl_ice_v2.f), 
avg(tbl_ice_v2.f)
+from tbl_ice
+join tbl_ice_v2 on tbl_ice.a=tbl_ice_v2.d where tbl_ice.c > 52
+group by tbl_ice.b, tbl_ice.c;
+
+-- insert some new values to one of the source tables
+insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), 
(4, 'four', 53), (5, 'five', 54);
+insert into tbl_ice_v2 values (1, 'one v2', 50), (4, 'four v2', 53), (5, 'five 
v2', 54);
+

Review Comment:
   Suggest adding a sanity test for DELETE .. in that case the new logic should 
skip MV incremental maintenance. 



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HivePushdownSnapshotFilterRule.java:
##########
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptTable;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexShuttle;
+import org.apache.calcite.rex.RexTableInputRef;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.hadoop.hive.ql.metadata.VirtualColumn;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter;
+
+import java.util.Set;
+
+/**
+ * Calcite rule to push down predicates contains {@link 
VirtualColumn#SNAPSHOT_ID} reference to TableScan.
+ * <p>
+ * This rule traverse the logical expression in {@link HiveFilter} operators 
and search for
+ * predicates like
+ * <p>
+ * <code>
+ *   snapshotId &lt;= 12345677899
+ * </code>
+ * <p>
+ * The literal is set in the {@link RelOptHiveTable#getHiveTableMD()} object 
wrapped by
+ * {@link 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan}
+ * and the original predicate in the {@link HiveFilter} is replaced with 
literal true.
+ */
+public class HivePushdownSnapshotFilterRule extends 
RelRule<HivePushdownSnapshotFilterRule.Config> {
+
+  public static final RelOptRule INSTANCE =
+          RelRule.Config.EMPTY.as(HivePushdownSnapshotFilterRule.Config.class)
+            .withRelBuilderFactory(HiveRelFactories.HIVE_BUILDER)
+            .withOperandSupplier(operandBuilder -> 
operandBuilder.operand(HiveFilter.class).anyInputs())
+            .withDescription("HivePushdownSnapshotFilterRule")
+            .toRule();
+
+  public interface Config extends RelRule.Config {
+    @Override
+    default HivePushdownSnapshotFilterRule toRule() {
+      return new HivePushdownSnapshotFilterRule(this);
+    }
+  }
+
+  private HivePushdownSnapshotFilterRule(Config config) {
+    super(config);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+    HiveFilter filter = call.rel(0);

Review Comment:
   Should this rule do an early exit if the filter does not contain a predicate 
on snapshotId ? 



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAugmentSnapshotMaterializationRule.java:
##########
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelRule;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.fun.SqlStdOperatorTable;
+import org.apache.calcite.tools.RelBuilder;
+import org.apache.calcite.util.ImmutableBeans;
+import org.apache.hadoop.hive.common.type.SnapshotContext;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.metadata.VirtualColumn;
+import org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelFactories;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.translator.TypeConverter;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * This rule will rewrite the materialized view with information about
+ * its invalidation data. In particular, if any of the tables used by the
+ * materialization has been updated since the materialization was created,
+ * it will introduce a filter operator on top of that table in the 
materialization
+ * definition, making explicit the data contained in it so the rewriting
+ * algorithm can use this information to rewrite the query as a combination of 
the
+ * outdated materialization data and the new original data in the source 
tables.
+ * If the data in the source table matches the current data in the snapshot,
+ * no filter is created.
+ * In case of tables supports snapshots the filtering should be performed in 
the
+ * TableScan operator to read records only from the relevant snapshots.
+ * However, the union rewrite algorithm needs a so-called compensation 
predicate in
+ * a Filter operator to build the union branch produces the delta records.
+ * After union rewrite algorithm is executed the predicates on SnapshotIds
+ * are pushed down to the corresponding TableScan operator and removed from 
the Filter
+ * operator. So the reference to the {@link VirtualColumn#SNAPSHOT_ID} is 
temporal in the

Review Comment:
   nit: temporary instead of temporal



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveMaterializedViewUtils.java:
##########
@@ -277,6 +276,20 @@ public void visit(RelNode node, int ordinal, RelNode 
parent) {
             materialization.getAst());
   }
 
+  private static HiveRelOptMaterialization 
augmentMaterializationWithTimeInformation(

Review Comment:
   nit: pls add a brief comment for this important method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] amansinha100 commented on a diff in pull request #4079: HIVE-27101: Support incremental materialized view rebuild when Iceberg source tables have insert operation only.

Reply via email to