In our case, we had to do some copy-pasting of rules in the past in Hive. Although the reasons have been already discussed in the thread, just to confirm: - Rules that use a default factory to create operators e.g. LogicalProject, instead of being able to provide our own factory. An example is AggregateProjectMergeRule. (Instead of copy/pasting the rule, we could apply another rule to transform LogicalProject into HiveProject, but this is not the right way either). - Rules that match directly on Logical instances of the operators. An example is SortProjectTransposeRule.
I see that with the solutions that we are discussing, those two problems would be solved. About including the matching classes in RelBuilder, it can be a good idea if we still make the rules extensible i.e. we can provide our own matching classes if we want to. Assume a project that might subclass the Project operator into two different operators, then would like to match a certain rule only to one of them. Further, I guess using the copy methods provided by the operators when possible, instead of calling directly to the Factory, would make the rules more extensible for these cases. -- Jesús On 9/23/15, 11:08 PM, "Jinfeng Ni" <[email protected]> wrote: >I did a quick check of Drill side rule, and did not find the copy/paste >code >simply because we have to specify the rule's operands. > >The reason I'm asking this is because many Calcite rules use core >Filter.class, >Project.class etc. Currently, Drill simply uses some of Calcite rules >directly. >However, that means the rule will fire against both LogicalFilter, and >DrillFilter, >which seems to be redundant. I understand we could extend Calcite rule to >make sure it matches only one of them. But I'm looking for a bit more >direct way >to do that. > >For instance, in [2] we uses Calcite's FilterMergeRule directly, which >would match >both LogicalFilter and DrillFilter. > >Also, I feel the rule patten specification can be regarded as the >input type for the >rule, while RelBuilder is to address the output type from the rule's >firing. That's why >I feel they had better be specified together in some way. > > >[1] >https://github.com/apache/incubator-calcite/blob/master/core/src/main/java >/org/apache/calcite/rel/rules/FilterMergeRule.java#L46 > >[2] >https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/o >rg/apache/drill/exec/planner/logical/DrillRuleSets.java#L121 > >On Wed, Sep 23, 2015 at 2:17 PM, Jacques Nadeau <[email protected]> >wrote: >> I think this would be a good place to have a little bit more design, >> discussion and validation. Julian, it seems like you've done a lot of >> thinking and it would be good to have a better understanding of the >>choices >> you're making. (Your response below seems to be just scratching the >> surface). >> >> If the goal is to minimize copy and paste rules, it would be useful to >>test >> the proposal against example rules that have been copy/pasted and see >>what >> other patterns might emerge. At first glance, I'm inclined more with >> Jinfeng's proposal regarding operands than yours. Your decision about >> copy/not behavior should also be discussed further. >> >> Drill (Jinfeng?) and Hive (Jesus?) guys, can you provide an example set >>of >> rules that we've needed to copy/paste due to constraints around rule >> configuration to make sure the outcome of these changes is as desired? >> >> >> >> On Wed, Sep 23, 2015 at 1:28 PM, Julian Hyde <[email protected]> wrote: >> >>> Great question. >>> >>> For that case, I think you should carry on making a separate rule >>> instance with operands that contain classes. RelBuilder is a good >>> single point to put a lot of the configuration, but it is taking it >>> too far to put the target classes in there. I don't want its role as >>> "point for configuration for rules" to overwhelm its main role "an >>> easy way to build relational algebra expressions". >>> >>> And besides, operands are powerful and descriptive, and because the >>> engine understands them it can short-cut which rules are fired. >>> >>> For what it's worth, there's another piece of configuration that I >>> considered putting into RelBuilder but decided not to. That is, >>> whether a rule instance should call RelNode.copy to automatically >>> create RelNodes the same type as its inputs. See I introduced >>> copyFilter and copyProject arguments in FilterProjectTransposeRule: >>> >>> >>> >>>https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b06a4 >>>cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules/Fi >>>lterProjectTransposeRule.java#L67 >>> >>> rather than introduce new methods RelBuilder.shouldCopyProject() and >>> .shouldCopyFilter(). >>> >>> And by the way I also considered adding methods to RelBuilderFactory >>> (formerly known as ProtoRelBuilder). Adding methods there would not >>> "pollute" RelBuilder as badly. >>> >>> Julian >>> >>> >>> On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <[email protected]> >>> wrote: >>> > +1. >>> > >>> > I have one question. RelBuilder seems to be used in onMatch() >>>method, >>> > when a particular type of RelNode has to be created. What about >>> > RelOptRuleOperand in the constructor of rule? Let's say if we want to >>> > rule to match only DrillProject, or HiveFilter. The current way >>>seems to >>> > be that we extend the rule by providing a different >>>RelOptRuleOperand. >>> > >>> > Can RelBuilder also have getFilterClass(), getProjectClass() etc, >>>which >>> > would be used to specify the rule patten matching? >>> > >>> > >>> > >>> > On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <[email protected]> >>> wrote: >>> >> Agree on the utility of this. +1. >>> >> >>> >> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran < >>> >> [email protected]> wrote: >>> >> >>> >>> This is useful, though Hive will have to do some refactoring. >>> >>> >>> >>> In past Hive had to copy bunch of rules to work around these >>>issues. >>> >>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good >>> example of >>> >>> this. >>> >>> >>> >>> >>> >>> John >>> >>> >>> >>> On 9/22/15, 5:14 PM, "Julian Hyde" <[email protected]> wrote: >>> >>> >>> >>> >I have started work on >>> >>> >https://issues.apache.org/jira/browse/CALCITE-828 and my >>> >>> >work-in-progress is in >>> >>> > >>> >>>https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder >>>. >>> >>> > >>> >>> >This will be a big change to teams such as Hive and Drill. Now >>>would >>> >>> >be a good time to chime in. >>> >>> > >>> >>> >The plan is that every rule instance has a ProtoRelBuilder field >>>from >>> >>> >which a RelBuilder can be created to be used in an onMatch >>>method. The >>> >>> >RelBuilder contains factories for every kind of RelNode, so we >>>won't >>> >>> >have to keep plumbing new factories in to existing rules. >>> >>> > >>> >>> >There will be other advantages. RelBuilder is an easier and more >>> >>> >concise way of creating relational expressions. It does some >>>useful >>> >>> >stuff for free, like flattening ANDs and eliminating identity >>> >>> >projections. I'm seeing the volume of code decrease and a few >>>minor >>> >>> >plan improvements. >>> >>> > >>> >>> >I haven't yet found a good way to deal with the pattern where if, >>>say, >>> >>> >ProjectFactory is null the rule is to use Project.copy to create a >>> >>> >Project of the same type. >>> >>> > >>> >>> >I'm keeping & deprecating the old rule constructors that take a >>> >>> >variety of XxxFactory arguments. >>> >>> > >>> >>> >As always, we try to keep API compatibility and mark the old API >>> >>> > >>> >>> > @Deprecated // to be removed before 2.0 >>> >>> > >>> >>> >but the deprecated APIs are no longer tested and are likely to be >>> >>> >flaky. We strongly suggest that you fix calls to deprecated APIs >>>as >>> >>> >soon as you upgrade to a more recent version of Calcite. >>> >>> > >>> >>> >Julian >>> >>> > >>> >>> >>> >>> >>> >
