Hey Victor,

In general it is best to avoid stateful rules but in some cases it may
be unavoidable. A common way to check state for deciding whether to
apply a transformation (or not) is via the metadata mechanism
(RelMetadataQuery).

It seems that when the union is already subject to a limit then the
new rule should not trigger. To achieve that one way would be to
introduce a condition that relies on RelMetadataQuery#getMaxRowCount
of the union (or its inputs) and bail out when the optimization is not
gonna bring some notable benefit.

```
LogicalSort[fetch = 10]
  LogicalUnion[all=true]
    LogicalSort[fetch = 10]
      <Input 1>
    LogicalSort[fetch = 10]
      <Input 2>
```
In the plan above the max row count of the union should be 20 and
pushing a limit 10 is not gonna change that so the rule should detect
that (e.g., via getMaxRowCount) and not trigger again.

If you start working with metadata classes you may find that some
cases are not fully handled so don't hesitate to raise JIRAs/PRs to
improve this area.

Best,
Stamatis

On Thu, Aug 29, 2024 at 11:31 PM Victor Barua
<victor.ba...@datadoghq.com.invalid> wrote:
>
> Hello!
>
> We've been attempting to implement a simple optimization rule in which we
> duplicate a limit through a UNION ALL to reduce the amount of data we need
> to fetch for a query.
>
> Starting from something like
> ```
> LogicalSort[fetch = 10]
>   LogicalUnion[all=true]
>     <Input 1>
>     <Input 2>
> ```
>
> We're trying to turn it into
> ```
> LogicalSort[fetch = 10]
>   LogicalUnion[all=true]
>     LogicalSort[fetch = 10]
>       <Input 1>
>     LogicalSort[fetch = 10]
>       <Input 2>
> ```
>
> This, somewhat expectedly, causes issues with the VolcanoPlanner because
> the newly generated relation is also a candidate for our rule so we end up
> with an infinite planning loop. We tried to take inspiration from the
> JoinCommuteRule, which uses the ensureRegistered method to prevent this (or
> at least that's what we think it's doing). Unfortunately, in our case this
> appears to be insufficient. I would appreciate any pointers and or
> suggestions around this. I've included the code for the raw rule below.
>
>
> ```
>
> @Value.Enclosing
> public class LimitThroughUnionRule extends 
> RelRule<LimitThroughUnionRule.Config>
>     implements SubstitutionRule {
>
>   public static final LimitThroughUnionRule INSTANCE =
>       LimitThroughUnionRule.Config.DEFAULT.toRule();
>
>   protected LimitThroughUnionRule(Config config) {
>     super(config);
>   }
>
>   private RelNode pushLimitThrough(RelBuilder relBuilder, Sort sort,
> Union union) {
>     for (RelNode unionInput : union.getInputs()) {
>       relBuilder.push(unionInput).sortLimit(sort.offset, sort.fetch,
> Collections.emptyList());
>     }
>     relBuilder.union(true);
>     relBuilder.sortLimit(sort.offset, sort.fetch, sort.getSortExps());
>     return relBuilder.build();
>   }
>
>   @Override
>   public void onMatch(RelOptRuleCall call) {
>     Sort sort = call.rel(0);
>     Union union = call.rel(1);
>
>     RelNode newNode = pushLimitThrough(call.builder(), sort, union);
>     RelNode nextNode =
>         pushLimitThrough(call.builder(), (Sort) newNode, (Union)
> newNode.getInput(0));
>
>     call.transformTo(newNode);
>     call.getPlanner().ensureRegistered(nextNode, newNode);
>   }
>
>   @Value.Immutable
>   public interface Config extends RelRule.Config {
>     LimitThroughUnionRule.Config DEFAULT =
>         ImmutableLimitThroughUnionRule.Config.builder()
>             .operandSupplier(
>                 b0 ->
>                     b0.operand(Sort.class)
>                         .predicate(sort -> !(RelOptUtil.isOrder(sort)
> || RelOptUtil.isOffset(sort)))
>                         .oneInput(u -> u.operand(Union.class).anyInputs()))
>             .build();
>
>     @Override
>     default LimitThroughUnionRule toRule() {
>       return new LimitThroughUnionRule(this);
>     }
>   }
> }
>
> ```

Reply via email to