I think we may solve the problem from two aspects: 1. Do not try to preserve type (nullability) of Join/Filter condition expression when simplifying or something like pushing down. 2. We can do some work (remove unnecessary CAST) right before create a Join/Filter, as Julian said, something in RelBuilder could be done. I've do some fix in above Link (remove unnecessary CAST when doing pushDownEqualJoinConditions)
On Thu, Mar 26, 2020 at 7:14 PM Shuo Cheng <[email protected]> wrote: > Sorry for the late reply, I've reproduced the problem here > https://github.com/cshuo/calcite/commit/b9a7fb5f536825d3a577bf42a5fc6cc7d4df7929 > . > > On Wed, Mar 25, 2020 at 12:38 AM Julian Hyde <[email protected]> wrote: > >> It does seem to be something that RelBuilder could do. (RexSimplify can’t >> really do it, because it doesn’t know how the expression is being used.) >> >> It’s also worth discovering why the CAST was added in the first place. It >> doesn’t seem to be helpful. I think we should strive to eliminate all of >> the slightly unhelpful things that Calcite does; those things can add up >> and cause major inefficiencies in the planning process and/or sub-optimal >> plans. >> >> Julian >> >> >> > On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich <[email protected]> wrote: >> > >> > Hey, >> > >> > That's a great diagnosis :) >> > I would guess that newCondition became non-nullable for some reason >> (rexSimplify runs under RexProgramBuilder so it might be able to narrow the >> nullability) >> > you could try invoking simplify.simplifyPreservingType() on it to see >> if that would help. >> > >> > > I know it's necessary to preserve the nullability when simplifying a >> boolean expression in project columns, but as for condition in Filter/Calc, >> may be we can omit the >> > > nullability? >> > I think that could probably work - we can't change the nullability on >> project columns because those could be referenced (and the reference also >> has the type) ; but for filter/join conditions we don't need to care with >> it. >> > It seems we already have a "matchnullability" in ReduceExpressionsRule >> ; for FILTER/JOIN we should probably turn that off... :) >> > >> > cheers, >> > Zoltan >> > >> > >> > On 3/24/20 9:15 AM, Shuo Cheng wrote: >> >> Hi Zoltan, >> >> I encountered the problem when running TPC tests, and have not >> reproduced it in Calcite master. >> >> But I figured it out how the problem is produced: >> >> There is semi join with the condition:AND(EXPANDED_INDF1, >> EXPANDED_INDF2), type of AND is BOOLEAN with nullable `true` >> >> After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2), >> type of AND is BOOLEAN with nullable `true` >> >> After SemiJoinProjectTransposeRule --> Join condition: >> CAST(AND(INDF1, INDF2)), type of AND is BOOLEAN with nullable `false` >> >> Just as what you suspected, It's in `SemiJoinProjectTransposeRule` >> where forced type correction is added by `RexProgramBuilder#addCondition`, >> which will call `RexSimplify#simplifyPreservingType` before registering an >> expression. >> >> I know it's necessary to preserve the nullability when simplifying a >> boolean expression in project columns, but as for condition in Filter/Calc, >> may be we can omit the nullability? >> >> Best Regards, >> >> Shuo >> >> On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich <[email protected] <mailto: >> [email protected]>> wrote: >> >> Hey Shuo! >> >> I think that simplification should been made on join conditions - >> I've done a quick check; and it seems to be working for me. >> >> I suspected that it will be either a missing call to RexSimplify >> for some reason - or it is added by a forced return type correction: IIRC >> there are some cases in which >> >> the >> >> RexNode type should retained after simplification. >> >> Is this reproducible on current master; could you share a testcase? >> >> cheers, >> >> Zoltan >> >> On 3/24/20 7:28 AM, Shuo Cheng wrote: >> >> > Hi, Julian, That's what we do as a workaround way. we remove >> CAST which are >> >> > only widening nullability as what CALCITE-2695 does before >> applying >> >> > hash-join/sort-merge-join rule, such that equiv predicate can be >> split >> >> > out. I'm not sure whether it's properly for Calcite to do the >> 'convert >> >> > back' job, for example, simplify the join condition when create >> a Join; Or >> >> > maybe let other systems what use Calcite to do the "convert >> back" job as an >> >> > optimization? What do you think? >> >> > >> >> > On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde < >> [email protected] <mailto:[email protected]>> wrote: >> >> > >> >> >> Or convert it back to a not-nullable BOOLEAN? The join >> condition treats >> >> >> UNKNOWN the same as FALSE, and besides UNKNOWN will never >> occur, so the >> >> >> conditions with and without the CAST are equivalent. >> >> >> >> >> >> Julian >> >> >> >> >> >>> On Mar 23, 2020, at 9:34 PM, Shuo Cheng <[email protected] >> <mailto:[email protected]>> wrote: >> >> >>> >> >> >>> Hi all, >> >> >>> >> >> >>> Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, >> $2), >> >> >>> BOOLEAN)', which cast the non-nullable BOOLEAN to nullable >> BOOLEAN, >> >> >> Calcite >> >> >>> can not split out equiv predicate, thus some join operation >> like hash >> >> >> join >> >> >>> / sort merge join may not be used. Maybe we can >> >> >>> expand RelOptUtil#splitJoinCondition to support this scenario? >> >> >> >> >> > >> >>
