I have completed work on this change. Can someone please review and +1? https://github.com/julianhyde/incubator-calcite/tree/828-rule-builder
https://issues.apache.org/jira/browse/CALCITE-828 Julian > On Sep 25, 2015, at 1:07 PM, Julian Hyde <[email protected]> wrote: > > The problem of making rules extensible is not straightforward. The solution > has to be a mixture of API changes and best practices. In this change I am > making API changes and fixing up the existing rules to match, but I am not > revisiting the rules to introduce best practices. > > The API change is that every RelOptRule has a relBuilderFactory. We strongly > recommend that if a rule needs to create a relational expression it either > matches the type of its inputs (calling the copy method) or it uses the > factory. That means removing calls in rules to methods such as > LogicalProject.create or RelOptUtil.project. I’ve done quite a lot of this as > part of this change. > > The best practice is that a rule’s constructor should expose > RelBuilderFactory and Class parameters so that the rule can be re-used. Quite > a few rules do this, but a lot do not. Quite a few rules are re-usable in > principle, but have not been tested against other RelNode sub-classes, and > quite a few rules are just not re-usable. So I propose that we enforce best > practices when reviewing contributions but don’t spend a lot of effort > upgrading the existing rules. > > In some cases (e.g. WindowedAggRelSplitter in ProjectToWindowRule) the rule > uses the default factory but I’ve changed the code to use a RelBuilder. So, > there’s no change in behavior, but if someone were to add a RelBuilderFactory > to the rule’s constructor, they would reap the benefit. > > Julian > > > >> On Sep 24, 2015, at 6:29 AM, Jesus Camacho Rodriguez >> <[email protected]> wrote: >> >> In our case, we had to do some copy-pasting of rules in the past in Hive. >> Although the reasons have been already discussed in the thread, just to >> confirm: >> - Rules that use a default factory to create operators e.g. >> LogicalProject, instead of being able to provide our own factory. An >> example is AggregateProjectMergeRule. (Instead of copy/pasting the rule, >> we could apply another rule to transform LogicalProject into HiveProject, >> but this is not the right way either). >> - Rules that match directly on Logical instances of the operators. An >> example is SortProjectTransposeRule. >> >> I see that with the solutions that we are discussing, those two problems >> would be solved. >> >> About including the matching classes in RelBuilder, it can be a good idea >> if we still make the rules extensible i.e. we can provide our own matching >> classes if we want to. Assume a project that might subclass the Project >> operator into two different operators, then would like to match a certain >> rule only to one of them. Further, I guess using the copy methods provided >> by the operators when possible, instead of calling directly to the >> Factory, would make the rules more extensible for these cases. >> >> -- >> Jesús >> >> >> On 9/23/15, 11:08 PM, "Jinfeng Ni" <[email protected]> wrote: >> >>> I did a quick check of Drill side rule, and did not find the copy/paste >>> code >>> simply because we have to specify the rule's operands. >>> >>> The reason I'm asking this is because many Calcite rules use core >>> Filter.class, >>> Project.class etc. Currently, Drill simply uses some of Calcite rules >>> directly. >>> However, that means the rule will fire against both LogicalFilter, and >>> DrillFilter, >>> which seems to be redundant. I understand we could extend Calcite rule to >>> make sure it matches only one of them. But I'm looking for a bit more >>> direct way >>> to do that. >>> >>> For instance, in [2] we uses Calcite's FilterMergeRule directly, which >>> would match >>> both LogicalFilter and DrillFilter. >>> >>> Also, I feel the rule patten specification can be regarded as the >>> input type for the >>> rule, while RelBuilder is to address the output type from the rule's >>> firing. That's why >>> I feel they had better be specified together in some way. >>> >>> >>> [1] >>> https://github.com/apache/incubator-calcite/blob/master/core/src/main/java >>> /org/apache/calcite/rel/rules/FilterMergeRule.java#L46 >>> >>> [2] >>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/o >>> rg/apache/drill/exec/planner/logical/DrillRuleSets.java#L121 >>> >>> On Wed, Sep 23, 2015 at 2:17 PM, Jacques Nadeau <[email protected]> >>> wrote: >>>> I think this would be a good place to have a little bit more design, >>>> discussion and validation. Julian, it seems like you've done a lot of >>>> thinking and it would be good to have a better understanding of the >>>> choices >>>> you're making. (Your response below seems to be just scratching the >>>> surface). >>>> >>>> If the goal is to minimize copy and paste rules, it would be useful to >>>> test >>>> the proposal against example rules that have been copy/pasted and see >>>> what >>>> other patterns might emerge. At first glance, I'm inclined more with >>>> Jinfeng's proposal regarding operands than yours. Your decision about >>>> copy/not behavior should also be discussed further. >>>> >>>> Drill (Jinfeng?) and Hive (Jesus?) guys, can you provide an example set >>>> of >>>> rules that we've needed to copy/paste due to constraints around rule >>>> configuration to make sure the outcome of these changes is as desired? >>>> >>>> >>>> >>>> On Wed, Sep 23, 2015 at 1:28 PM, Julian Hyde <[email protected]> wrote: >>>> >>>>> Great question. >>>>> >>>>> For that case, I think you should carry on making a separate rule >>>>> instance with operands that contain classes. RelBuilder is a good >>>>> single point to put a lot of the configuration, but it is taking it >>>>> too far to put the target classes in there. I don't want its role as >>>>> "point for configuration for rules" to overwhelm its main role "an >>>>> easy way to build relational algebra expressions". >>>>> >>>>> And besides, operands are powerful and descriptive, and because the >>>>> engine understands them it can short-cut which rules are fired. >>>>> >>>>> For what it's worth, there's another piece of configuration that I >>>>> considered putting into RelBuilder but decided not to. That is, >>>>> whether a rule instance should call RelNode.copy to automatically >>>>> create RelNodes the same type as its inputs. See I introduced >>>>> copyFilter and copyProject arguments in FilterProjectTransposeRule: >>>>> >>>>> >>>>> >>>>> https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b06a4 >>>>> cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules/Fi >>>>> lterProjectTransposeRule.java#L67 >>>>> >>>>> rather than introduce new methods RelBuilder.shouldCopyProject() and >>>>> .shouldCopyFilter(). >>>>> >>>>> And by the way I also considered adding methods to RelBuilderFactory >>>>> (formerly known as ProtoRelBuilder). Adding methods there would not >>>>> "pollute" RelBuilder as badly. >>>>> >>>>> Julian >>>>> >>>>> >>>>> On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <[email protected]> >>>>> wrote: >>>>>> +1. >>>>>> >>>>>> I have one question. RelBuilder seems to be used in onMatch() >>>>> method, >>>>>> when a particular type of RelNode has to be created. What about >>>>>> RelOptRuleOperand in the constructor of rule? Let's say if we want to >>>>>> rule to match only DrillProject, or HiveFilter. The current way >>>>> seems to >>>>>> be that we extend the rule by providing a different >>>>> RelOptRuleOperand. >>>>>> >>>>>> Can RelBuilder also have getFilterClass(), getProjectClass() etc, >>>>> which >>>>>> would be used to specify the rule patten matching? >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <[email protected]> >>>>> wrote: >>>>>>> Agree on the utility of this. +1. >>>>>>> >>>>>>> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> This is useful, though Hive will have to do some refactoring. >>>>>>>> >>>>>>>> In past Hive had to copy bunch of rules to work around these >>>>> issues. >>>>>>>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good >>>>> example of >>>>>>>> this. >>>>>>>> >>>>>>>> >>>>>>>> John >>>>>>>> >>>>>>>> On 9/22/15, 5:14 PM, "Julian Hyde" <[email protected]> wrote: >>>>>>>> >>>>>>>>> I have started work on >>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-828 and my >>>>>>>>> work-in-progress is in >>>>>>>>> >>>>> >>>>> https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder >>>>> . >>>>>>>>> >>>>>>>>> This will be a big change to teams such as Hive and Drill. Now >>>>> would >>>>>>>>> be a good time to chime in. >>>>>>>>> >>>>>>>>> The plan is that every rule instance has a ProtoRelBuilder field >>>>> from >>>>>>>>> which a RelBuilder can be created to be used in an onMatch >>>>> method. The >>>>>>>>> RelBuilder contains factories for every kind of RelNode, so we >>>>> won't >>>>>>>>> have to keep plumbing new factories in to existing rules. >>>>>>>>> >>>>>>>>> There will be other advantages. RelBuilder is an easier and more >>>>>>>>> concise way of creating relational expressions. It does some >>>>> useful >>>>>>>>> stuff for free, like flattening ANDs and eliminating identity >>>>>>>>> projections. I'm seeing the volume of code decrease and a few >>>>> minor >>>>>>>>> plan improvements. >>>>>>>>> >>>>>>>>> I haven't yet found a good way to deal with the pattern where if, >>>>> say, >>>>>>>>> ProjectFactory is null the rule is to use Project.copy to create a >>>>>>>>> Project of the same type. >>>>>>>>> >>>>>>>>> I'm keeping & deprecating the old rule constructors that take a >>>>>>>>> variety of XxxFactory arguments. >>>>>>>>> >>>>>>>>> As always, we try to keep API compatibility and mark the old API >>>>>>>>> >>>>>>>>> @Deprecated // to be removed before 2.0 >>>>>>>>> >>>>>>>>> but the deprecated APIs are no longer tested and are likely to be >>>>>>>>> flaky. We strongly suggest that you fix calls to deprecated APIs >>>>> as >>>>>>>>> soon as you upgrade to a more recent version of Calcite. >>>>>>>>> >>>>>>>>> Julian >>>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>> >> >
