julianhyde commented on code in PR #3495:
URL: https://github.com/apache/calcite/pull/3495#discussion_r1396265880
##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java:
##########
@@ -378,12 +402,36 @@ public Boolean areColumnsUnique(Intersect rel,
RelMetadataQuery mq,
return null;
}
+ /**
+ * If an input's unique column value is returned (passed through) by an
aggregation function, then
Review Comment:
Can you prefix a statement of what this function returns? "Returns whether
... This is the case if ...".
##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java:
##########
@@ -378,12 +402,36 @@ public Boolean areColumnsUnique(Intersect rel,
RelMetadataQuery mq,
return null;
}
+ /**
+ * If an input's unique column value is returned (passed through) by an
aggregation function, then
+ * the result of the function is also unique.
+ *
+ * @param columns the columns to test for uniqueness
+ * @param rel the Aggregate {@link RelNode}
+ * @param inputUniqueKeys the set of unique key sets of the Aggregate input
(child RelNode)
+ *
+ * @return whether the input columns are unique by containing a unique
aggregated column
+ */
+ private boolean columnsContainUniqueAgg(ImmutableBitSet columns, Aggregate
rel,
+ Set<ImmutableBitSet> inputUniqueKeys) {
+ List<AggregateCall> aggCallList = rel.getAggCallList();
+ for (int aggIndex = 0; aggIndex < aggCallList.size(); aggIndex++) {
+ AggregateCall call = aggCallList.get(aggIndex);
+ if (PASSTHROUGH_AGGREGATIONS.contains(call.getAggregation().getKind())
+ &&
inputUniqueKeys.contains(ImmutableBitSet.of(call.getArgList().get(0)))
Review Comment:
You assume that there is exactly one argument. Also you pass in the set of
unique keys.
Instead, you could ask whether the argument(s) of the method are unique - by
calling `areColumnsUnique(args)` on the input.
It will yield the same result, but makes fewer assumptions.
##########
core/src/main/java/org/apache/calcite/plan/RelOptPredicateList.java:
##########
@@ -245,4 +248,37 @@ public boolean isEffectivelyNotNull(RexNode e) {
}
return false;
}
+
+ /** Return the set of columns that are set to a constant or scalar. */
+ public ImmutableBitSet getInvariantColumnSet() {
+ ImmutableBitSet.Builder builder = ImmutableBitSet.builder();
+ constantMap.keySet()
+ .stream()
+ .filter(RexInputRef.class::isInstance)
+ .map(RexInputRef.class::cast)
+ .map(RexSlot::getIndex)
+ .forEach(builder::set);
+
+ pulledUpPredicates.forEach(rex -> {
+ if (rex.getKind() == SqlKind.EQUALS
+ || rex.getKind() == SqlKind.IS_NOT_DISTINCT_FROM) {
+ List<RexNode> ops = ((RexCall) rex).getOperands();
+ RexNode op0 = ops.get(0);
+ RexNode op1 = ops.get(1);
+ addInputRefIfOtherInvariant(builder, op0, op1);
+ addInputRefIfOtherInvariant(builder, op1, op0);
+ }
+ });
+
+ return builder.build();
+ }
+
+ private void addInputRefIfOtherInvariant(ImmutableBitSet.Builder builder,
RexNode inputRef,
+ RexNode invariant) {
+ if (inputRef instanceof RexInputRef
+ && (invariant.getKind() == SqlKind.LITERAL
+ || invariant.getKind() == SqlKind.SCALAR_QUERY)) {
+ builder.set(((RexInputRef) inputRef).getIndex());
Review Comment:
A scalar subquery does not always return the same value if there are
correlating variables. Can you add javadoc explaining what the `invariant`
parameter means. Has it already been proven invariant?
##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdPredicates.java:
##########
@@ -533,6 +535,36 @@ public RelOptPredicateList getPredicates(Exchange exchange,
return mq.getPulledUpPredicates(input);
}
+ /**
+ * Infers predicates for a Values.
+ */
+ public RelOptPredicateList getPredicates(Values values, RelMetadataQuery mq)
{
+ ImmutableList<ImmutableList<RexLiteral>> tuples = values.tuples;
+ if (tuples.size() > 0) {
+ List<RexLiteral> constants = new ArrayList<>(tuples.get(0));
+ for (ImmutableList<RexLiteral> tuple : tuples) {
+ for (int i = 0; i < tuple.size(); i++) {
+ if (!Objects.equals(tuple.get(i), constants.get(i))) {
+ constants.set(i, null);
+ }
+ }
+ }
+ RexBuilder rexBuilder = values.getCluster().getRexBuilder();
+ List<RexNode> predicates = new ArrayList<>();
+ for (int i = 0; i < constants.size(); i++) {
+ RexLiteral literal = constants.get(i);
+ if (literal != null) {
+ predicates.add(
+ rexBuilder.makeCall(SqlStdOperatorTable.EQUALS,
+ rexBuilder.makeInputRef(literal.getType(), i),
Review Comment:
fix indentation
##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java:
##########
@@ -67,6 +70,9 @@ public class RelMdColumnUniqueness
public static final RelMetadataProvider SOURCE =
ReflectiveRelMetadataProvider.reflectiveSource(
new RelMdColumnUniqueness(),
BuiltInMetadata.ColumnUniqueness.Handler.class);
+ static final Set<SqlKind> PASSTHROUGH_AGGREGATIONS =
Review Comment:
This is a clever idea. It should be a code comment, for the benefit of
future maintainers.
You should not apply this 'passthrough' logic to aggregate functions with a
filter. `MIN(x) FILTER (WHERE y > 10)` is not necessarily unique even if `x` is
unique. (Because `y > 10` may be false for all rows, and therefore it could
return NULL.)
##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java:
##########
@@ -67,6 +70,9 @@ public class RelMdColumnUniqueness
public static final RelMetadataProvider SOURCE =
ReflectiveRelMetadataProvider.reflectiveSource(
new RelMdColumnUniqueness(),
BuiltInMetadata.ColumnUniqueness.Handler.class);
+ static final Set<SqlKind> PASSTHROUGH_AGGREGATIONS =
Review Comment:
Also consider making this an interface, similar to `SqlStaticAggFunction`,
`SqlSingletonAggFunction`. That way, user-defined aggregate functions can have
this property.
##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdUniqueKeys.java:
##########
@@ -74,11 +84,30 @@ private RelMdUniqueKeys() {}
public @Nullable Set<ImmutableBitSet> getUniqueKeys(Filter rel,
RelMetadataQuery mq,
boolean ignoreNulls) {
- return mq.getUniqueKeys(rel.getInput(), ignoreNulls);
+ Set<ImmutableBitSet> uniqueKeys = mq.getUniqueKeys(rel.getInput(),
ignoreNulls);
+ if (uniqueKeys == null) {
+ return null;
+ }
+ // Remove constant columns from each key
+ RelOptPredicateList predicates = mq.getPulledUpPredicates(rel);
+ if (RelOptPredicateList.isEmpty(predicates)) {
+ return uniqueKeys;
+ } else {
+ ImmutableBitSet invariantColumns = predicates.getInvariantColumnSet();
+ return uniqueKeys.stream()
+ .map(key -> key.rebuild()
Review Comment:
Use `ImmutableBitSet.except`
##########
core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java:
##########
@@ -378,12 +402,36 @@ public Boolean areColumnsUnique(Intersect rel,
RelMetadataQuery mq,
return null;
}
+ /**
+ * If an input's unique column value is returned (passed through) by an
aggregation function, then
+ * the result of the function is also unique.
+ *
+ * @param columns the columns to test for uniqueness
+ * @param rel the Aggregate {@link RelNode}
+ * @param inputUniqueKeys the set of unique key sets of the Aggregate input
(child RelNode)
+ *
+ * @return whether the input columns are unique by containing a unique
aggregated column
+ */
+ private boolean columnsContainUniqueAgg(ImmutableBitSet columns, Aggregate
rel,
+ Set<ImmutableBitSet> inputUniqueKeys) {
+ List<AggregateCall> aggCallList = rel.getAggCallList();
+ for (int aggIndex = 0; aggIndex < aggCallList.size(); aggIndex++) {
Review Comment:
This `for` loop seems backwards to me. It would make more sense to iterate
over `columns`, and ask whether each is a pass-through aggregate function.
Whereas you loop over all aggregate functions.
##########
core/src/main/java/org/apache/calcite/plan/RelOptPredicateList.java:
##########
@@ -245,4 +248,36 @@ public boolean isEffectivelyNotNull(RexNode e) {
}
return false;
}
+
+ /** Return the set of columns that are set to a constant or scalar. */
+ public ImmutableBitSet getInvariantColumnSet() {
Review Comment:
I prefer 'constant'. (In programming languages, 'invariant' is commonly used
for a condition that is always true, e.g. 'x > 0', and in the SQL world we call
that a 'constraint' or 'predicate'.)
We already use the term 'literal' widely, in a precise sense. People also
know that 'constant' is wider than literal, and therefore you need to define it
in each context. Examples:
* `2` is a literal and therefore a constant;
* `2 + 3` because it is a deterministic function applied to constants (and
therefore it can be constant-reduced);
* `CURRENT_DATE` is constant within a query execution;
* In the output of 'select x from t limit 1', the column `x` will be a
constant in the sense that all rows in the set have the same value.
* In the output of `select x from t where false`, `x` will be constant in
the sense that there are no rows and therefore, trivially, all rows have the
same value
In the javadoc, you should give examples that you consider constants and not
constants.
Consider making the method not-`public`. If it is public you must test it
directly, and its semantics cannot change in future releases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]