Sorry for the late reply, I've reproduced the problem here https://github.com/cshuo/calcite/commit/b9a7fb5f536825d3a577bf42a5fc6cc7d4df7929 .
On Wed, Mar 25, 2020 at 12:38 AM Julian Hyde <[email protected]> wrote: > It does seem to be something that RelBuilder could do. (RexSimplify can’t > really do it, because it doesn’t know how the expression is being used.) > > It’s also worth discovering why the CAST was added in the first place. It > doesn’t seem to be helpful. I think we should strive to eliminate all of > the slightly unhelpful things that Calcite does; those things can add up > and cause major inefficiencies in the planning process and/or sub-optimal > plans. > > Julian > > > > On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich <[email protected]> wrote: > > > > Hey, > > > > That's a great diagnosis :) > > I would guess that newCondition became non-nullable for some reason > (rexSimplify runs under RexProgramBuilder so it might be able to narrow the > nullability) > > you could try invoking simplify.simplifyPreservingType() on it to see if > that would help. > > > > > I know it's necessary to preserve the nullability when simplifying a > boolean expression in project columns, but as for condition in Filter/Calc, > may be we can omit the > > > nullability? > > I think that could probably work - we can't change the nullability on > project columns because those could be referenced (and the reference also > has the type) ; but for filter/join conditions we don't need to care with > it. > > It seems we already have a "matchnullability" in ReduceExpressionsRule ; > for FILTER/JOIN we should probably turn that off... :) > > > > cheers, > > Zoltan > > > > > > On 3/24/20 9:15 AM, Shuo Cheng wrote: > >> Hi Zoltan, > >> I encountered the problem when running TPC tests, and have not > reproduced it in Calcite master. > >> But I figured it out how the problem is produced: > >> There is semi join with the condition:AND(EXPANDED_INDF1, > EXPANDED_INDF2), type of AND is BOOLEAN with nullable `true` > >> After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2), > type of AND is BOOLEAN with nullable `true` > >> After SemiJoinProjectTransposeRule --> Join condition: CAST(AND(INDF1, > INDF2)), type of AND is BOOLEAN with nullable `false` > >> Just as what you suspected, It's in `SemiJoinProjectTransposeRule` > where forced type correction is added by `RexProgramBuilder#addCondition`, > which will call `RexSimplify#simplifyPreservingType` before registering an > expression. > >> I know it's necessary to preserve the nullability when simplifying a > boolean expression in project columns, but as for condition in Filter/Calc, > may be we can omit the nullability? > >> Best Regards, > >> Shuo > >> On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich <[email protected] <mailto: > [email protected]>> wrote: > >> Hey Shuo! > >> I think that simplification should been made on join conditions - > I've done a quick check; and it seems to be working for me. > >> I suspected that it will be either a missing call to RexSimplify for > some reason - or it is added by a forced return type correction: IIRC there > are some cases in which > >> the > >> RexNode type should retained after simplification. > >> Is this reproducible on current master; could you share a testcase? > >> cheers, > >> Zoltan > >> On 3/24/20 7:28 AM, Shuo Cheng wrote: > >> > Hi, Julian, That's what we do as a workaround way. we remove CAST > which are > >> > only widening nullability as what CALCITE-2695 does before > applying > >> > hash-join/sort-merge-join rule, such that equiv predicate can be > split > >> > out. I'm not sure whether it's properly for Calcite to do the > 'convert > >> > back' job, for example, simplify the join condition when create a > Join; Or > >> > maybe let other systems what use Calcite to do the "convert back" > job as an > >> > optimization? What do you think? > >> > > >> > On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde < > [email protected] <mailto:[email protected]>> wrote: > >> > > >> >> Or convert it back to a not-nullable BOOLEAN? The join condition > treats > >> >> UNKNOWN the same as FALSE, and besides UNKNOWN will never occur, > so the > >> >> conditions with and without the CAST are equivalent. > >> >> > >> >> Julian > >> >> > >> >>> On Mar 23, 2020, at 9:34 PM, Shuo Cheng <[email protected] > <mailto:[email protected]>> wrote: > >> >>> > >> >>> Hi all, > >> >>> > >> >>> Considering the Join condition 'CAST(IS_NOT_DISTINCT_FROM($1, > $2), > >> >>> BOOLEAN)', which cast the non-nullable BOOLEAN to nullable > BOOLEAN, > >> >> Calcite > >> >>> can not split out equiv predicate, thus some join operation > like hash > >> >> join > >> >>> / sort merge join may not be used. Maybe we can > >> >>> expand RelOptUtil#splitJoinCondition to support this scenario? > >> >> > >> > > >
