Based on the discussion so far, it seems we would want to go with option
#3.   Let me know if there are potential problems with that approach.

Aman

On Mon, May 11, 2015 at 8:43 PM, Aman Sinha <[email protected]> wrote:

> Apart from the JoinType,  Correlate would also need to have the
> 'condition' to represent a join condition because the FilterJoinRule relies
> on placing the join condition on the join node during filter push down.
>
> Summarizing the alternatives:
> 1.  Have a completely separate implementation of Correlate specific
> rules.  This has the obvious disadvantage of redundant code.  Also, it is
> unlikely that
>      methods such as classifyFilters() would work seamlessly with the
> Correlate specific rules.
> 2.  The redundant code in #1 can be mitigated by creating base classes for
> some of the rules and have the Join specific and Correlator specific rules
> share
>       the code.
> 3. Modify Correlate to have JoinType,  SemiJoinType as well as
> 'condition'.   In this sense, it is getting closer to a Join without
> actually being a derived class
>     of Join. The FilterJoinRule and similar rules would  be modified  to
>  use 'BiRel'  instead of 'Join'  since BiRel is the base class for both
> Join and Correlate.
>
> To Julian's question about the list of rules affected,  it seems most of
> the *Join*Rules would probably need examination otherwise we could miss
> certain optimizations.   However,  we would get most bang for the buck by
> focusing on FilterJoinRule, so I would like to get that taken care of
> first.
>
> Aman
>
>
> On Mon, May 11, 2015 at 7:06 PM, Julian Hyde <[email protected]> wrote:
>
>> Seems a bit of a stretch, since Join has other ways to represent SEMI and
>> ANTI. Maybe a Correlate could have both a JoinType and a SemiJoinType?
>>
>> Can you & Vladimir find a compromise for how to restore the missing
>> functionality with no more copy-paste than necessary. It would help if we
>> had a full list of rules which ought to work for Correlate.
>>
>> Julian
>>
>> On May 11, 2015, at 5:27 PM, Jinfeng Ni <[email protected]> wrote:
>>
>> > Can we extend Join.JoinType, so that it includes the SemiJointype (SEMI,
>> > ANTI) represented by Correlate? That way, we could leverage the rule for
>> > Join and apply them to Correlate as well, just like the way it used to
>> > work. Otherwise, we have to come up with a new set of rules for
>> Correlate,
>> > to make thing work again.
>> >
>> >
>> >
>> > On Mon, May 11, 2015 at 5:02 PM, Julian Hyde <[email protected]>
>> wrote:
>> >
>> >> This comment in Correlate seems to express Vladimir’s motivation:
>> >>
>> >>> Correlate is not a join since: typical rules should not match
>> Correlate.
>> >>
>> >> I agree with him. For instance, Correlate.joinType is enum
>> SemiJoinType {
>> >> INNER, LEFT, SEMI, ANTI } and therefore different semantics to
>> >> Join.joinType.
>> >>
>> >> It’s unfortunate that FilterJoinRule got broken. We should fix it. Any
>> >> other rules that would be needed? Probably ProjectJoinTransposeRule,
>> >> AggregateJoinTransposeRule.
>> >>
>> >> Julian
>> >>
>> >>
>> >> On May 11, 2015, at 4:17 PM, Aman Sinha <[email protected]> wrote:
>> >>
>> >>> As part of CALCITE-483,  the class hierarchy of CorrelateRel was
>> changed
>> >>> such that the new LogicalCorrelate is not a derived class of Join
>> >> anymore.
>> >>> Thus, any rule such as FilterJoinRule that used to push the filter
>> down
>> >>> into the Join (or a derived class of Join) does not apply anymore for
>> the
>> >>> LogicalCorrelate.
>> >>>
>> >>> I am continuing down the path of my proposal to  have a version of the
>> >> push
>> >>> filter rule that allows pushing into/past a LogicalCorrelate.  But
>> >> perhaps
>> >>> Vladimir can shed some light on the motivation for changing the class
>> >>> hierarchy.
>> >>>
>> >>> thanks,
>> >>> Aman
>> >>>
>> >>>
>> >>> On Mon, May 11, 2015 at 10:21 AM, Aman Sinha <[email protected]>
>> >> wrote:
>> >>>
>> >>>> Note that I have made some changes to the decorrlation logic to call
>> >>>> findBestExp()  *after*  the decorrelation is done and supply it the
>> set
>> >> of
>> >>>> rules including FilterJoinRule.  This does push the join condition
>> into
>> >> one
>> >>>> part of the tree but it does not push it into all other parts where
>> that
>> >>>> join may have been copied during decorrelation.    The main point is:
>> >> we
>> >>>> need to do the filter pushdown early rather than late.
>> >>>>
>> >>>> Aman
>> >>>>
>> >>>> On Mon, May 11, 2015 at 10:16 AM, Aman Sinha <[email protected]>
>> >> wrote:
>> >>>>
>> >>>>> I want to be able to push the join condition (=($7, $9)) highlighted
>> >> into
>> >>>>> the LogicalJoin that is right below the LogicalCorrelate.  What's
>> the
>> >> right
>> >>>>> way to do it ?
>> >>>>>
>> >>>>> The current method of first decorrelating and then pushing the
>> filter
>> >>>>> (via the FilterJoinRule) is not quite right because once
>> decorrelation
>> >> is
>> >>>>> done, it may be too late to push the filter into the join.  During
>> >>>>> decorrelation we take that LogicalJoin (with its TRUE condition) and
>> >> push
>> >>>>> it into other places - for instance we call createDistinct() to
>> build a
>> >>>>> distinct row set on the result of this join but since the join has a
>> >> true
>> >>>>> condition, the distinct is created on a cartesian join.
>> >>>>>
>> >>>>> What I really need is something like a FilterJoinRule that allows
>> >> pushing
>> >>>>> it past a LogicalCorrelate.
>> >>>>>
>> >>>>> LogicalProject(EXPR$0=[1]): rowcount = 1.0, cumulative cost = 10.25,
>> >> id =
>> >>>>> 53
>> >>>>> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3],
>> >>>>> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8],
>> >>>>> DEPTNO0=[$9], NAME=[$10], EXPR$0=[$11]): rowcount = 1.0, cumulative
>> >> cost =
>> >>>>> 9.25, id = 71
>> >>>>> *   LogicalFilter(condition=[AND(=($7, $9), >($5, $11))]): rowcount
>> =
>> >>>>> 1.0, cumulative cost = 8.25, id = 68*
>> >>>>>     LogicalCorrelate(correlation=[$cor0], joinType=[LEFT],
>> >>>>> requiredColumns=[{0}]): rowcount = 1.0, cumulative cost = 7.25, id
>> = 61
>> >>>>>       LogicalJoin(condition=[true], joinType=[inner]): rowcount =
>> 1.0,
>> >>>>> cumulative cost = 1.0, id = 42
>> >>>>>         LogicalTableScan(table=[[CATALOG, SALES, EMP]]): rowcount =
>> >>>>> 1.0, cumulative cost = 0.0, id = 11
>> >>>>>         LogicalTableScan(table=[[CATALOG, SALES, DEPT]]): rowcount =
>> >>>>> 1.0, cumulative cost = 0.0, id = 12
>> >>>>>       LogicalAggregate(group=[{}], EXPR$0=[AVG($5)]): rowcount =
>> 1.0,
>> >>>>> cumulative cost = 2.125, id = 47
>> >>>>>         LogicalFilter(condition=[=($cor0.EMPNO, $0)]): rowcount =
>> 1.0,
>> >>>>> cumulative cost = 1.0, id = 45
>> >>>>>           LogicalTableScan(table=[[CATALOG, SALES, EMP]]): rowcount
>> =
>> >>>>> 1.0, cumulative cost = 0.0, id = 14
>> >>>>>
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Aman
>> >>>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Reply via email to