Hey Victor, In general it is best to avoid stateful rules but in some cases it may be unavoidable. A common way to check state for deciding whether to apply a transformation (or not) is via the metadata mechanism (RelMetadataQuery).
It seems that when the union is already subject to a limit then the new rule should not trigger. To achieve that one way would be to introduce a condition that relies on RelMetadataQuery#getMaxRowCount of the union (or its inputs) and bail out when the optimization is not gonna bring some notable benefit. ``` LogicalSort[fetch = 10] LogicalUnion[all=true] LogicalSort[fetch = 10] <Input 1> LogicalSort[fetch = 10] <Input 2> ``` In the plan above the max row count of the union should be 20 and pushing a limit 10 is not gonna change that so the rule should detect that (e.g., via getMaxRowCount) and not trigger again. If you start working with metadata classes you may find that some cases are not fully handled so don't hesitate to raise JIRAs/PRs to improve this area. Best, Stamatis On Thu, Aug 29, 2024 at 11:31 PM Victor Barua <victor.ba...@datadoghq.com.invalid> wrote: > > Hello! > > We've been attempting to implement a simple optimization rule in which we > duplicate a limit through a UNION ALL to reduce the amount of data we need > to fetch for a query. > > Starting from something like > ``` > LogicalSort[fetch = 10] > LogicalUnion[all=true] > <Input 1> > <Input 2> > ``` > > We're trying to turn it into > ``` > LogicalSort[fetch = 10] > LogicalUnion[all=true] > LogicalSort[fetch = 10] > <Input 1> > LogicalSort[fetch = 10] > <Input 2> > ``` > > This, somewhat expectedly, causes issues with the VolcanoPlanner because > the newly generated relation is also a candidate for our rule so we end up > with an infinite planning loop. We tried to take inspiration from the > JoinCommuteRule, which uses the ensureRegistered method to prevent this (or > at least that's what we think it's doing). Unfortunately, in our case this > appears to be insufficient. I would appreciate any pointers and or > suggestions around this. I've included the code for the raw rule below. > > > ``` > > @Value.Enclosing > public class LimitThroughUnionRule extends > RelRule<LimitThroughUnionRule.Config> > implements SubstitutionRule { > > public static final LimitThroughUnionRule INSTANCE = > LimitThroughUnionRule.Config.DEFAULT.toRule(); > > protected LimitThroughUnionRule(Config config) { > super(config); > } > > private RelNode pushLimitThrough(RelBuilder relBuilder, Sort sort, > Union union) { > for (RelNode unionInput : union.getInputs()) { > relBuilder.push(unionInput).sortLimit(sort.offset, sort.fetch, > Collections.emptyList()); > } > relBuilder.union(true); > relBuilder.sortLimit(sort.offset, sort.fetch, sort.getSortExps()); > return relBuilder.build(); > } > > @Override > public void onMatch(RelOptRuleCall call) { > Sort sort = call.rel(0); > Union union = call.rel(1); > > RelNode newNode = pushLimitThrough(call.builder(), sort, union); > RelNode nextNode = > pushLimitThrough(call.builder(), (Sort) newNode, (Union) > newNode.getInput(0)); > > call.transformTo(newNode); > call.getPlanner().ensureRegistered(nextNode, newNode); > } > > @Value.Immutable > public interface Config extends RelRule.Config { > LimitThroughUnionRule.Config DEFAULT = > ImmutableLimitThroughUnionRule.Config.builder() > .operandSupplier( > b0 -> > b0.operand(Sort.class) > .predicate(sort -> !(RelOptUtil.isOrder(sort) > || RelOptUtil.isOffset(sort))) > .oneInput(u -> u.operand(Union.class).anyInputs())) > .build(); > > @Override > default LimitThroughUnionRule toRule() { > return new LimitThroughUnionRule(this); > } > } > } > > ```