Re: [PR] [CALCITE-6044] Make single-row rels unique [calcite]

via GitHub Thu, 16 Nov 2023 15:15:49 -0800


julianhyde commented on code in PR #3495:
URL: https://github.com/apache/calcite/pull/3495#discussion_r1396265880



##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java:
##########
@@ -378,12 +402,36 @@ public Boolean areColumnsUnique(Intersect rel, 
RelMetadataQuery mq,
     return null;
   }
 
+  /**
+   * If an input's unique column value is returned (passed through) by an 
aggregation function, then

Review Comment:
   Can you prefix a statement of what this function returns? "Returns whether 
... This is the case if ...".



##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java:
##########
@@ -378,12 +402,36 @@ public Boolean areColumnsUnique(Intersect rel, 
RelMetadataQuery mq,
     return null;
   }
 
+  /**
+   * If an input's unique column value is returned (passed through) by an 
aggregation function, then
+   * the result of the function is also unique.
+   *
+   * @param columns         the columns to test for uniqueness
+   * @param rel             the Aggregate {@link RelNode}
+   * @param inputUniqueKeys the set of unique key sets of the Aggregate input 
(child RelNode)
+   *
+   * @return whether the input columns are unique by containing a unique 
aggregated column
+   */
+  private boolean columnsContainUniqueAgg(ImmutableBitSet columns, Aggregate 
rel,
+      Set<ImmutableBitSet> inputUniqueKeys) {
+    List<AggregateCall> aggCallList = rel.getAggCallList();
+    for (int aggIndex = 0; aggIndex < aggCallList.size(); aggIndex++) {
+      AggregateCall call = aggCallList.get(aggIndex);
+      if (PASSTHROUGH_AGGREGATIONS.contains(call.getAggregation().getKind())
+          && 
inputUniqueKeys.contains(ImmutableBitSet.of(call.getArgList().get(0)))

Review Comment:
   You assume that there is exactly one argument. Also you pass in the set of 
unique keys.
   
   Instead, you could ask whether the argument(s) of the method are unique - by 
calling `areColumnsUnique(args)` on the input.
   
   It will yield the same result, but makes fewer assumptions. 



##########
core/src/main/java/org/apache/calcite/plan/RelOptPredicateList.java:
##########
@@ -245,4 +248,37 @@ public boolean isEffectivelyNotNull(RexNode e) {
     }
     return false;
   }
+
+  /** Return the set of columns that are set to a constant or scalar. */
+  public ImmutableBitSet getInvariantColumnSet() {
+    ImmutableBitSet.Builder builder = ImmutableBitSet.builder();
+    constantMap.keySet()
+        .stream()
+        .filter(RexInputRef.class::isInstance)
+        .map(RexInputRef.class::cast)
+        .map(RexSlot::getIndex)
+        .forEach(builder::set);
+
+    pulledUpPredicates.forEach(rex -> {
+      if (rex.getKind() == SqlKind.EQUALS
+          || rex.getKind() == SqlKind.IS_NOT_DISTINCT_FROM) {
+        List<RexNode> ops = ((RexCall) rex).getOperands();
+        RexNode op0 = ops.get(0);
+        RexNode op1 = ops.get(1);
+        addInputRefIfOtherInvariant(builder, op0, op1);
+        addInputRefIfOtherInvariant(builder, op1, op0);
+      }
+    });
+
+    return builder.build();
+  }
+
+  private void addInputRefIfOtherInvariant(ImmutableBitSet.Builder builder, 
RexNode inputRef,
+      RexNode invariant) {
+    if (inputRef instanceof RexInputRef
+        && (invariant.getKind() == SqlKind.LITERAL
+        || invariant.getKind() == SqlKind.SCALAR_QUERY)) {
+      builder.set(((RexInputRef) inputRef).getIndex());

Review Comment:
   A scalar subquery does not always return the same value if there are 
correlating variables. Can you add javadoc explaining what the `invariant` 
parameter means. Has it already been proven invariant?



##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdPredicates.java:
##########
@@ -533,6 +535,36 @@ public RelOptPredicateList getPredicates(Exchange exchange,
     return mq.getPulledUpPredicates(input);
   }
 
+  /**
+   * Infers predicates for a Values.
+   */
+  public RelOptPredicateList getPredicates(Values values, RelMetadataQuery mq) 
{
+    ImmutableList<ImmutableList<RexLiteral>> tuples = values.tuples;
+    if (tuples.size() > 0) {
+      List<RexLiteral> constants = new ArrayList<>(tuples.get(0));
+      for (ImmutableList<RexLiteral> tuple : tuples) {
+        for (int i = 0; i < tuple.size(); i++) {
+          if (!Objects.equals(tuple.get(i), constants.get(i))) {
+            constants.set(i, null);
+          }
+        }
+      }
+      RexBuilder rexBuilder = values.getCluster().getRexBuilder();
+      List<RexNode> predicates = new ArrayList<>();
+      for (int i = 0; i < constants.size(); i++) {
+        RexLiteral literal = constants.get(i);
+        if (literal != null) {
+          predicates.add(
+              rexBuilder.makeCall(SqlStdOperatorTable.EQUALS,
+              rexBuilder.makeInputRef(literal.getType(), i),

Review Comment:
   fix indentation



##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java:
##########
@@ -67,6 +70,9 @@ public class RelMdColumnUniqueness
   public static final RelMetadataProvider SOURCE =
       ReflectiveRelMetadataProvider.reflectiveSource(
           new RelMdColumnUniqueness(), 
BuiltInMetadata.ColumnUniqueness.Handler.class);
+  static final Set<SqlKind> PASSTHROUGH_AGGREGATIONS =

Review Comment:
   This is a clever idea. It should be a code comment, for the benefit of 
future maintainers.
   
   You should not apply this 'passthrough' logic to aggregate functions with a 
filter. `MIN(x) FILTER (WHERE y > 10)` is not necessarily unique even if `x` is 
unique. (Because `y > 10` may be false for all rows, and therefore it could 
return NULL.)
   
   



##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java:
##########
@@ -67,6 +70,9 @@ public class RelMdColumnUniqueness
   public static final RelMetadataProvider SOURCE =
       ReflectiveRelMetadataProvider.reflectiveSource(
           new RelMdColumnUniqueness(), 
BuiltInMetadata.ColumnUniqueness.Handler.class);
+  static final Set<SqlKind> PASSTHROUGH_AGGREGATIONS =

Review Comment:
   Also consider making this an interface, similar to `SqlStaticAggFunction`, 
`SqlSingletonAggFunction`. That way, user-defined aggregate functions can have 
this property.



##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdUniqueKeys.java:
##########
@@ -74,11 +84,30 @@ private RelMdUniqueKeys() {}
 
   public @Nullable Set<ImmutableBitSet> getUniqueKeys(Filter rel, 
RelMetadataQuery mq,
       boolean ignoreNulls) {
-    return mq.getUniqueKeys(rel.getInput(), ignoreNulls);
+    Set<ImmutableBitSet> uniqueKeys = mq.getUniqueKeys(rel.getInput(), 
ignoreNulls);
+    if (uniqueKeys == null) {
+      return null;
+    }
+    // Remove constant columns from each key
+    RelOptPredicateList predicates = mq.getPulledUpPredicates(rel);
+    if (RelOptPredicateList.isEmpty(predicates)) {
+      return uniqueKeys;
+    } else {
+      ImmutableBitSet invariantColumns = predicates.getInvariantColumnSet();
+      return uniqueKeys.stream()
+          .map(key -> key.rebuild()

Review Comment:
   Use `ImmutableBitSet.except`



##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java:
##########
@@ -378,12 +402,36 @@ public Boolean areColumnsUnique(Intersect rel, 
RelMetadataQuery mq,
     return null;
   }
 
+  /**
+   * If an input's unique column value is returned (passed through) by an 
aggregation function, then
+   * the result of the function is also unique.
+   *
+   * @param columns         the columns to test for uniqueness
+   * @param rel             the Aggregate {@link RelNode}
+   * @param inputUniqueKeys the set of unique key sets of the Aggregate input 
(child RelNode)
+   *
+   * @return whether the input columns are unique by containing a unique 
aggregated column
+   */
+  private boolean columnsContainUniqueAgg(ImmutableBitSet columns, Aggregate 
rel,
+      Set<ImmutableBitSet> inputUniqueKeys) {
+    List<AggregateCall> aggCallList = rel.getAggCallList();
+    for (int aggIndex = 0; aggIndex < aggCallList.size(); aggIndex++) {

Review Comment:
   This `for` loop seems backwards to me. It would make more sense to iterate 
over `columns`, and ask whether each is a pass-through aggregate function. 
Whereas you loop over all aggregate functions.



##########
core/src/main/java/org/apache/calcite/plan/RelOptPredicateList.java:
##########
@@ -245,4 +248,36 @@ public boolean isEffectivelyNotNull(RexNode e) {
     }
     return false;
   }
+
+  /** Return the set of columns that are set to a constant or scalar. */
+  public ImmutableBitSet getInvariantColumnSet() {

Review Comment:
   I prefer 'constant'. (In programming languages, 'invariant' is commonly used 
for a condition that is always true, e.g. 'x > 0', and in the SQL world we call 
that a 'constraint' or 'predicate'.)
   
   We already use the term 'literal' widely, in a precise sense. People also 
know that 'constant' is wider than literal, and therefore you need to define it 
in each context. Examples:
    * `2` is a literal and therefore a constant;
    * `2 + 3` because it is a deterministic function applied to constants (and 
therefore it can be constant-reduced);
    * `CURRENT_DATE` is constant within a query execution;
    * In the output of 'select x from t limit 1', the column `x` will be a 
constant in the sense that all rows in the set have the same value.
    * In the output of `select x from t where false`, `x` will be constant in 
the sense that there are no rows and therefore, trivially, all rows have the 
same value
   
   In the javadoc, you should give examples that you consider constants and not 
constants.
   
   Consider making the method not-`public`. If it is public you must test it 
directly, and its semantics cannot change in future releases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CALCITE-6044] Make single-row rels unique [calcite]

Reply via email to