I have completed work on this change. Can someone please review and +1?

https://github.com/julianhyde/incubator-calcite/tree/828-rule-builder

https://issues.apache.org/jira/browse/CALCITE-828

Julian

> On Sep 25, 2015, at 1:07 PM, Julian Hyde <[email protected]> wrote:
> 
> The problem of making rules extensible is not straightforward. The solution 
> has to be a mixture of API changes and best practices. In this change I am 
> making API changes and fixing up the existing rules to match, but I am not 
> revisiting the rules to introduce best practices.
> 
> The API change is that every RelOptRule has a relBuilderFactory. We strongly 
> recommend that if a rule needs to create a relational expression it either 
> matches the type of its inputs (calling the copy method) or it uses the 
> factory. That means removing calls in rules to methods such as 
> LogicalProject.create or RelOptUtil.project. I’ve done quite a lot of this as 
> part of this change.
> 
> The best practice is that a rule’s constructor should expose 
> RelBuilderFactory and Class parameters so that the rule can be re-used. Quite 
> a few rules do this, but a lot do not. Quite a few rules are re-usable in 
> principle, but have not been tested against other RelNode sub-classes, and 
> quite a few rules are just not re-usable. So I propose that we enforce best 
> practices when reviewing contributions but don’t spend a lot of effort 
> upgrading the existing rules.
> 
> In some cases (e.g. WindowedAggRelSplitter in ProjectToWindowRule) the rule 
> uses the default factory but I’ve changed the code to use a RelBuilder. So, 
> there’s no change in behavior, but if someone were to add a RelBuilderFactory 
> to the rule’s constructor, they would reap the benefit.
> 
> Julian
> 
> 
> 
>> On Sep 24, 2015, at 6:29 AM, Jesus Camacho Rodriguez 
>> <[email protected]> wrote:
>> 
>> In our case, we had to do some copy-pasting of rules in the past in Hive.
>> Although the reasons have been already discussed in the thread, just to
>> confirm:
>> - Rules that use a default factory to create operators e.g.
>> LogicalProject, instead of being able to provide our own factory. An
>> example is AggregateProjectMergeRule. (Instead of copy/pasting the rule,
>> we could apply another rule to transform LogicalProject into HiveProject,
>> but this is not the right way either).
>> - Rules that match directly on Logical instances of the operators. An
>> example is SortProjectTransposeRule.
>> 
>> I see that with the solutions that we are discussing, those two problems
>> would be solved.
>> 
>> About including the matching classes in RelBuilder, it can be a good idea
>> if we still make the rules extensible i.e. we can provide our own matching
>> classes if we want to. Assume a project that might subclass the Project
>> operator into two different operators, then would like to match a certain
>> rule only to one of them. Further, I guess using the copy methods provided
>> by the operators when possible, instead of calling directly to the
>> Factory, would make the rules more extensible for these cases.
>> 
>> --
>> Jesús
>> 
>> 
>> On 9/23/15, 11:08 PM, "Jinfeng Ni" <[email protected]> wrote:
>> 
>>> I did a quick check of Drill side rule, and did not find the copy/paste
>>> code
>>> simply because we have to specify the rule's operands.
>>> 
>>> The reason I'm asking this is because many Calcite rules use core
>>> Filter.class,
>>> Project.class etc.  Currently, Drill simply uses some of Calcite rules
>>> directly.
>>> However, that means the rule will fire against both LogicalFilter, and
>>> DrillFilter,
>>> which seems to be redundant.  I understand we could extend Calcite rule to
>>> make sure it matches only one of them. But I'm looking for a bit more
>>> direct way
>>> to do that.
>>> 
>>> For instance, in [2] we uses Calcite's FilterMergeRule directly, which
>>> would match
>>> both LogicalFilter and DrillFilter.
>>> 
>>> Also, I feel the rule patten specification can be regarded as the
>>> input type for the
>>> rule, while RelBuilder is to address the output type from the rule's
>>> firing. That's why
>>> I feel they had better be specified together in some way.
>>> 
>>> 
>>> [1] 
>>> https://github.com/apache/incubator-calcite/blob/master/core/src/main/java
>>> /org/apache/calcite/rel/rules/FilterMergeRule.java#L46
>>> 
>>> [2] 
>>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/o
>>> rg/apache/drill/exec/planner/logical/DrillRuleSets.java#L121
>>> 
>>> On Wed, Sep 23, 2015 at 2:17 PM, Jacques Nadeau <[email protected]>
>>> wrote:
>>>> I think this would be a good place to have a little bit more design,
>>>> discussion and validation. Julian, it seems like you've done a lot of
>>>> thinking and it would be good to have a better understanding of the
>>>> choices
>>>> you're making. (Your response below seems to be just scratching the
>>>> surface).
>>>> 
>>>> If the goal is to minimize copy and paste rules, it would be useful to
>>>> test
>>>> the proposal against example rules that have been copy/pasted and see
>>>> what
>>>> other patterns might emerge. At first glance, I'm inclined more with
>>>> Jinfeng's proposal regarding operands than yours. Your decision about
>>>> copy/not behavior should also be discussed further.
>>>> 
>>>> Drill (Jinfeng?) and Hive (Jesus?) guys, can you provide an example set
>>>> of
>>>> rules that we've needed to copy/paste due to constraints around rule
>>>> configuration to make sure the outcome of these changes is as desired?
>>>> 
>>>> 
>>>> 
>>>> On Wed, Sep 23, 2015 at 1:28 PM, Julian Hyde <[email protected]> wrote:
>>>> 
>>>>> Great question.
>>>>> 
>>>>> For that case, I think you should carry on making a separate rule
>>>>> instance with operands that contain classes. RelBuilder is a good
>>>>> single point to put a lot of the configuration, but it is taking it
>>>>> too far to put the target classes in there. I don't want its role as
>>>>> "point for configuration for rules" to overwhelm its main role "an
>>>>> easy way to build relational algebra expressions".
>>>>> 
>>>>> And besides, operands are powerful and descriptive, and because the
>>>>> engine understands them it can short-cut which rules are fired.
>>>>> 
>>>>> For what it's worth, there's another piece of configuration that I
>>>>> considered putting into RelBuilder but decided not to. That is,
>>>>> whether a rule instance should call RelNode.copy to automatically
>>>>> create RelNodes the same type as its inputs. See I introduced
>>>>> copyFilter and copyProject arguments in FilterProjectTransposeRule:
>>>>> 
>>>>> 
>>>>> 
>>>>> https://github.com/julianhyde/incubator-calcite/blob/8099cf3151a462b06a4
>>>>> cbb1c0cad8f7e90881ca5/core/src/main/java/org/apache/calcite/rel/rules/Fi
>>>>> lterProjectTransposeRule.java#L67
>>>>> 
>>>>> rather than introduce new methods RelBuilder.shouldCopyProject() and
>>>>> .shouldCopyFilter().
>>>>> 
>>>>> And by the way I also considered adding methods to RelBuilderFactory
>>>>> (formerly known as ProtoRelBuilder). Adding methods there would not
>>>>> "pollute" RelBuilder as badly.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>> On Wed, Sep 23, 2015 at 12:18 PM, Jinfeng Ni <[email protected]>
>>>>> wrote:
>>>>>> +1.
>>>>>> 
>>>>>> I have one question.  RelBuilder seems to be used in onMatch()
>>>>> method,
>>>>>> when a particular type of RelNode has to be created.  What about
>>>>>> RelOptRuleOperand in the constructor of rule? Let's say if we want to
>>>>>> rule to match only DrillProject, or HiveFilter. The current way
>>>>> seems to
>>>>>> be that we extend the rule by providing a different
>>>>> RelOptRuleOperand.
>>>>>> 
>>>>>> Can RelBuilder also have getFilterClass(), getProjectClass() etc,
>>>>> which
>>>>>> would be used to specify the rule patten matching?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Sep 22, 2015 at 6:41 PM, Jacques Nadeau <[email protected]>
>>>>> wrote:
>>>>>>> Agree on the utility of this. +1.
>>>>>>> 
>>>>>>> On Tue, Sep 22, 2015 at 6:22 PM, John Pullokkaran <
>>>>>>> [email protected]> wrote:
>>>>>>> 
>>>>>>>> This is useful, though Hive will have to do some refactoring.
>>>>>>>> 
>>>>>>>> In past Hive had to copy bunch of rules to work around these
>>>>> issues.
>>>>>>>> Rules & relnodes for org.apache.calcite.rel.logical.* is a good
>>>>> example of
>>>>>>>> this.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> John
>>>>>>>> 
>>>>>>>> On 9/22/15, 5:14 PM, "Julian Hyde" <[email protected]> wrote:
>>>>>>>> 
>>>>>>>>> I have started work on
>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-828 and my
>>>>>>>>> work-in-progress is in
>>>>>>>>> 
>>>>> 
>>>>> https://github.com/julianhyde/incubator-calcite/commits/828-rule-builder
>>>>> .
>>>>>>>>> 
>>>>>>>>> This will be a big change to teams such as Hive and Drill. Now
>>>>> would
>>>>>>>>> be a good time to chime in.
>>>>>>>>> 
>>>>>>>>> The plan is that every rule instance has a ProtoRelBuilder field
>>>>> from
>>>>>>>>> which a RelBuilder can be created to be used in an onMatch
>>>>> method. The
>>>>>>>>> RelBuilder contains factories for every kind of RelNode, so we
>>>>> won't
>>>>>>>>> have to keep plumbing new factories in to existing rules.
>>>>>>>>> 
>>>>>>>>> There will be other advantages. RelBuilder is an easier and more
>>>>>>>>> concise way of creating relational expressions. It does some
>>>>> useful
>>>>>>>>> stuff for free, like flattening ANDs and eliminating identity
>>>>>>>>> projections. I'm seeing the volume of code decrease and a few
>>>>> minor
>>>>>>>>> plan improvements.
>>>>>>>>> 
>>>>>>>>> I haven't yet found a good way to deal with the pattern where if,
>>>>> say,
>>>>>>>>> ProjectFactory is null the rule is to use Project.copy to create a
>>>>>>>>> Project of the same type.
>>>>>>>>> 
>>>>>>>>> I'm keeping & deprecating the old rule constructors that take a
>>>>>>>>> variety of XxxFactory arguments.
>>>>>>>>> 
>>>>>>>>> As always, we try to keep API compatibility and mark the old API
>>>>>>>>> 
>>>>>>>>> @Deprecated // to be removed before 2.0
>>>>>>>>> 
>>>>>>>>> but the deprecated APIs are no longer tested and are likely to be
>>>>>>>>> flaky. We strongly suggest that you fix calls to deprecated APIs
>>>>> as
>>>>>>>>> soon as you upgrade to a more recent version of Calcite.
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> 
>> 
> 

Reply via email to