In our case, we had to do some copy-pasting of rules in the past in Hive.
Although the reasons have been already discussed in the thread, just to
confirm:
- Rules that use a default factory to create operators e.g.
LogicalProject, instead of being able to provide our own factory. An
example is AggregateProjectMergeRule. (Instead of copy/pasting the rule,
we could apply another rule to transform LogicalProject into HiveProject,
but this is not the right way either).
- Rules that match directly on Logical instances of the operators. An
example is SortProjectTransposeRule.

I see that with the solutions that we are discussing, those two problems
would be solved.

About including the matching classes in RelBuilder, it can be a good idea
if we still make the rules extensible i.e. we can provide our own matching
classes if we want to. Assume a project that might subclass the Project
operator into two different operators, then would like to match a certain
rule only to one of them. Further, I guess using the copy methods provided
by the operators when possible, instead of calling directly to the
Factory, would make the rules more extensible for these cases.

--
Jesús


On 9/23/15, 11:08 PM, "Jinfeng Ni" <[email protected]> wrote:

>I did a quick check of Drill side rule, and did not find the copy/paste
>code
>simply because we have to specify the rule's operands.
>
>The reason I'm asking this is because many Calcite rules use core
>Filter.class,
>Project.class etc.  Currently, Drill simply uses some of Calcite rules
>directly.
>However, that means the rule will fire against both LogicalFilter, and
>DrillFilter,
>which seems to be redundant.  I understand we could extend Calcite rule to
>make sure it matches only one of them. But I'm looking for a bit more
>direct way
>to do that.
>
>For instance, in [2] we uses Calcite's FilterMergeRule directly, which
>would match
>both LogicalFilter and DrillFilter.
>
>Also, I feel the rule patten specification can be regarded as the
>input type for the
>rule, while RelBuilder is to address the output type from the rule's
>firing. That's why
>I feel they had better be specified together in some way.
>
>
>[1] 
>https://github.com/apache/incubator-calcite/blob/master/core/src/main/java
>/org/apache/calcite/rel/rules/FilterMergeRule.java#L46
>
>[2] 
>https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/o
>rg/apache/drill/exec/planner/logical/DrillRuleSets.java#L121
>
>On Wed, Sep 23, 2015 at 2:17 PM, Jacques Nadeau <[email protected]>
>wrote:
>> I think this would be a good place to have a little bit more design,
>> discussion and validation. Julian, it seems like you've done a lot of
>> thinking and it would be good to have a better understanding of the
>>choices
>> you're making. (Your response below seems to be just scratching the
>> surface).
>>
>> If the goal is to minimize copy and paste rules, it would be useful to
>>test
>> the proposal against example rules that have been copy/pasted and see
>>what
>> other patterns might emerge. At first glance, I'm inclined more with
>> Jinfeng's proposal regarding operands than yours. Your decision about
>> copy/not behavior should also be discussed further.
>>
>> Drill (Jinfeng?) and Hive (Jesus?) guys, can you provide an example set
>>of
>> rules that we've needed to copy/paste due to constraints around rule
>> configuration to make sure the outcome of these changes is as desired?
>>
>>
>>
>> On Wed, Sep 23, 2015 at 1:28 PM, Julian Hyde <[email protected]> wrote:
>>
>>> Great question.
>>>
>>> For that case, I think you should carry on making a separate rule
>>> instance with operands that contain classes. RelBuilder is a good
>>> single point to put a lot of the configuration, but it is taking it
>>> too far to put the target classes in there. I don't want its role as
>>> "point for configuration for rules" to overwhelm its main role "an
>>> easy way to build relational algebra expressions".
>>>
>>> And besides, operands are powerful and descriptive, and because the
>>> engine understands them it can short-cut which rules are fired.
>>>
>>> For what it's worth, there's another piece of configuration that I
>>> considered putting into RelBuilder but decided not to. That is,
>>> whether a rule instance should call RelNode.copy to automatically
>>> create RelNodes the same type as its inputs. See I introduced
>>> copyFilter and copyProject arguments in FilterProjectTransposeRule:
>>>
>>>
>>> 
>>>https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b06a4
>>>cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules/Fi
>>>lterProjectTransposeRule.java#L67
>>>
>>> rather than introduce new methods RelBuilder.shouldCopyProject() and
>>> .shouldCopyFilter().
>>>
>>> And by the way I also considered adding methods to RelBuilderFactory
>>> (formerly known as ProtoRelBuilder). Adding methods there would not
>>> "pollute" RelBuilder as badly.
>>>
>>> Julian
>>>
>>>
>>> On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <[email protected]>
>>> wrote:
>>> > +1.
>>> >
>>> > I have one question.  RelBuilder seems to be used in onMatch()
>>>method,
>>> > when a particular type of RelNode has to be created.  What about
>>> > RelOptRuleOperand in the constructor of rule? Let's say if we want to
>>> > rule to match only DrillProject, or HiveFilter. The current way
>>>seems to
>>> > be that we extend the rule by providing a different
>>>RelOptRuleOperand.
>>> >
>>> > Can RelBuilder also have getFilterClass(), getProjectClass() etc,
>>>which
>>> > would be used to specify the rule patten matching?
>>> >
>>> >
>>> >
>>> > On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <[email protected]>
>>> wrote:
>>> >> Agree on the utility of this. +1.
>>> >>
>>> >> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
>>> >> [email protected]> wrote:
>>> >>
>>> >>> This is useful, though Hive will have to do some refactoring.
>>> >>>
>>> >>> In past Hive had to copy bunch of rules to work around these
>>>issues.
>>> >>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good
>>> example of
>>> >>> this.
>>> >>>
>>> >>>
>>> >>> John
>>> >>>
>>> >>> On 9/22/15, 5:14 PM, "Julian Hyde" <[email protected]> wrote:
>>> >>>
>>> >>> >I have started work on
>>> >>> >https://issues.apache.org/jira/browse/CALCITE-828 and my
>>> >>> >work-in-progress is in
>>> >>> >
>>> 
>>>https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder
>>>.
>>> >>> >
>>> >>> >This will be a big change to teams such as Hive and Drill. Now
>>>would
>>> >>> >be a good time to chime in.
>>> >>> >
>>> >>> >The plan is that every rule instance has a ProtoRelBuilder field
>>>from
>>> >>> >which a RelBuilder can be created to be used in an onMatch
>>>method. The
>>> >>> >RelBuilder contains factories for every kind of RelNode, so we
>>>won't
>>> >>> >have to keep plumbing new factories in to existing rules.
>>> >>> >
>>> >>> >There will be other advantages. RelBuilder is an easier and more
>>> >>> >concise way of creating relational expressions. It does some
>>>useful
>>> >>> >stuff for free, like flattening ANDs and eliminating identity
>>> >>> >projections. I'm seeing the volume of code decrease and a few
>>>minor
>>> >>> >plan improvements.
>>> >>> >
>>> >>> >I haven't yet found a good way to deal with the pattern where if,
>>>say,
>>> >>> >ProjectFactory is null the rule is to use Project.copy to create a
>>> >>> >Project of the same type.
>>> >>> >
>>> >>> >I'm keeping & deprecating the old rule constructors that take a
>>> >>> >variety of XxxFactory arguments.
>>> >>> >
>>> >>> >As always, we try to keep API compatibility and mark the old API
>>> >>> >
>>> >>> >  @Deprecated // to be removed before 2.0
>>> >>> >
>>> >>> >but the deprecated APIs are no longer tested and are likely to be
>>> >>> >flaky. We strongly suggest that you fix calls to deprecated APIs
>>>as
>>> >>> >soon as you upgrade to a more recent version of Calcite.
>>> >>> >
>>> >>> >Julian
>>> >>> >
>>> >>>
>>> >>>
>>>
>

Reply via email to