Thanks Zoltan. I added a test to RelBuilderTest and I can confirm that '_ IS TRUE' is stripped. And I agree that we cannot safely strip '_ IS NOT FALSE'.
On Tue, Mar 31, 2020 at 5:11 AM Zoltan Haindrich <[email protected]> wrote: > > > we should consider that 'b IS TRUE' and 'b IS NOT FALSE' > > I think we already do that... UnknownAs.FALSE essentially means that the > expression is enclosed in an "IS TRUE" - since we run filter/join condition > simplification in UAF > mode; its allowed to remove "_ IS TRUE" > > At the same time I don't see a way how "X IS NOT FALSE" could be removed > because in case X is null; the expression should evaluate to true (this > expression translates to > UnknownAs.TRUE mode) - which could be lost in case of a rewrite. We may > consider adding the UnknownAs mode to the filter/join node; but I think that > would just cause > trouble; are there some other way which I've not considered? > > cheers, > Zoltan > > On 3/30/20 7:19 PM, Julian Hyde wrote: > > If we're going down the path, we should consider that 'b IS TRUE' and > > 'b IS NOT FALSE' are somewhat like casts. Removing them from join > > conditions does not affect the result of the join. > > > > And the same apply to filter conditions. > > > > I don't know whether removing casts, _ IS TRUE and _ IS NOT FALSE from > > conditions genuinely make the world "simpler". But let's try it and > > see. > > > > On Mon, Mar 30, 2020 at 8:06 AM Zoltan Haindrich <[email protected]> wrote: > >> > >> Hey Shuo! > >> > >> Thank you for sharing the testcase! I've seen that you were able to fix it > >> by calling the builder instead of copy - right now I think fixing this > >> thru ReduceExpressionRule > >> might be better - as it could also fix up other cases. > >> I've tried disabling nullability retainment for filters/join conditions - > >> and it seems to be working; I'll submit it under [1]. > >> > >> Julian: I recommended to try that to provide a quick check to see if at > >> that point the issue could be fixed - I was confident that by disabling > >> "matchNullability" for > >> "simplifyPreservingType()" will do the right thing and it doesn't add an > >> unnecessary cast - instead it safely removes it; however: it still added > >> the cast...and by doing so > >> it didn't helped :) > >> > >> [1] https://issues.apache.org/jira/browse/CALCITE-3887 > >> > >> cheers, > >> Zoltan > >> > >> > >> On 3/26/20 12:22 PM, Shuo Cheng wrote: > >>> I think we may solve the problem from two aspects: > >>> 1. Do not try to preserve type (nullability) of Join/Filter condition > >>> expression when simplifying or something like pushing down. > >>> 2. We can do some work (remove unnecessary CAST) right before create a > >>> Join/Filter, as Julian said, something in RelBuilder could be done. > >>> I've do some fix in above Link (remove unnecessary CAST when doing > >>> pushDownEqualJoinConditions) > >>> > >>> On Thu, Mar 26, 2020 at 7:14 PM Shuo Cheng <[email protected]> wrote: > >>> > >>>> Sorry for the late reply, I've reproduced the problem here > >>>> https://github.com/cshuo/calcite/commit/b9a7fb5f536825d3a577bf42a5fc6cc7d4df7929 > >>>> . > >>>> > >>>> On Wed, Mar 25, 2020 at 12:38 AM Julian Hyde <[email protected]> wrote: > >>>> > >>>>> It does seem to be something that RelBuilder could do. (RexSimplify > >>>>> can’t > >>>>> really do it, because it doesn’t know how the expression is being used.) > >>>>> > >>>>> It’s also worth discovering why the CAST was added in the first place. > >>>>> It > >>>>> doesn’t seem to be helpful. I think we should strive to eliminate all of > >>>>> the slightly unhelpful things that Calcite does; those things can add up > >>>>> and cause major inefficiencies in the planning process and/or > >>>>> sub-optimal > >>>>> plans. > >>>>> > >>>>> Julian > >>>>> > >>>>> > >>>>>> On Mar 24, 2020, at 1:47 AM, Zoltan Haindrich <[email protected]> wrote: > >>>>>> > >>>>>> Hey, > >>>>>> > >>>>>> That's a great diagnosis :) > >>>>>> I would guess that newCondition became non-nullable for some reason > >>>>> (rexSimplify runs under RexProgramBuilder so it might be able to narrow > >>>>> the > >>>>> nullability) > >>>>>> you could try invoking simplify.simplifyPreservingType() on it to see > >>>>> if that would help. > >>>>>> > >>>>>>> I know it's necessary to preserve the nullability when simplifying a > >>>>> boolean expression in project columns, but as for condition in > >>>>> Filter/Calc, > >>>>> may be we can omit the > >>>>>>> nullability? > >>>>>> I think that could probably work - we can't change the nullability on > >>>>> project columns because those could be referenced (and the reference > >>>>> also > >>>>> has the type) ; but for filter/join conditions we don't need to care > >>>>> with > >>>>> it. > >>>>>> It seems we already have a "matchnullability" in ReduceExpressionsRule > >>>>> ; for FILTER/JOIN we should probably turn that off... :) > >>>>>> > >>>>>> cheers, > >>>>>> Zoltan > >>>>>> > >>>>>> > >>>>>> On 3/24/20 9:15 AM, Shuo Cheng wrote: > >>>>>>> Hi Zoltan, > >>>>>>> I encountered the problem when running TPC tests, and have not > >>>>> reproduced it in Calcite master. > >>>>>>> But I figured it out how the problem is produced: > >>>>>>> There is semi join with the condition:AND(EXPANDED_INDF1, > >>>>> EXPANDED_INDF2), type of AND is BOOLEAN with nullable `true` > >>>>>>> After JoinPushExpressionsRule -->> join condition: AND(INDF1, INDF2), > >>>>> type of AND is BOOLEAN with nullable `true` > >>>>>>> After SemiJoinProjectTransposeRule --> Join condition: > >>>>> CAST(AND(INDF1, INDF2)), type of AND is BOOLEAN with nullable `false` > >>>>>>> Just as what you suspected, It's in `SemiJoinProjectTransposeRule` > >>>>> where forced type correction is added by > >>>>> `RexProgramBuilder#addCondition`, > >>>>> which will call `RexSimplify#simplifyPreservingType` before registering > >>>>> an > >>>>> expression. > >>>>>>> I know it's necessary to preserve the nullability when simplifying a > >>>>> boolean expression in project columns, but as for condition in > >>>>> Filter/Calc, > >>>>> may be we can omit the nullability? > >>>>>>> Best Regards, > >>>>>>> Shuo > >>>>>>> On Tue, Mar 24, 2020 at 3:35 PM Zoltan Haindrich <[email protected] <mailto: > >>>>> [email protected]>> wrote: > >>>>>>> Hey Shuo! > >>>>>>> I think that simplification should been made on join conditions - > >>>>> I've done a quick check; and it seems to be working for me. > >>>>>>> I suspected that it will be either a missing call to RexSimplify > >>>>> for some reason - or it is added by a forced return type correction: > >>>>> IIRC > >>>>> there are some cases in which > >>>>>>> the > >>>>>>> RexNode type should retained after simplification. > >>>>>>> Is this reproducible on current master; could you share a > >>>>>>> testcase? > >>>>>>> cheers, > >>>>>>> Zoltan > >>>>>>> On 3/24/20 7:28 AM, Shuo Cheng wrote: > >>>>>>> > Hi, Julian, That's what we do as a workaround way. we remove > >>>>> CAST which are > >>>>>>> > only widening nullability as what CALCITE-2695 does before > >>>>> applying > >>>>>>> > hash-join/sort-merge-join rule, such that equiv predicate can > >>>>>>> be > >>>>> split > >>>>>>> > out. I'm not sure whether it's properly for Calcite to do the > >>>>> 'convert > >>>>>>> > back' job, for example, simplify the join condition when > >>>>>>> create > >>>>> a Join; Or > >>>>>>> > maybe let other systems what use Calcite to do the "convert > >>>>> back" job as an > >>>>>>> > optimization? What do you think? > >>>>>>> > > >>>>>>> > On Tue, Mar 24, 2020 at 2:04 PM Julian Hyde < > >>>>> [email protected] <mailto:[email protected]>> wrote: > >>>>>>> > > >>>>>>> >> Or convert it back to a not-nullable BOOLEAN? The join > >>>>> condition treats > >>>>>>> >> UNKNOWN the same as FALSE, and besides UNKNOWN will never > >>>>> occur, so the > >>>>>>> >> conditions with and without the CAST are equivalent. > >>>>>>> >> > >>>>>>> >> Julian > >>>>>>> >> > >>>>>>> >>> On Mar 23, 2020, at 9:34 PM, Shuo Cheng <[email protected] > >>>>> <mailto:[email protected]>> wrote: > >>>>>>> >>> > >>>>>>> >>> Hi all, > >>>>>>> >>> > >>>>>>> >>> Considering the Join condition > >>>>>>> 'CAST(IS_NOT_DISTINCT_FROM($1, > >>>>> $2), > >>>>>>> >>> BOOLEAN)', which cast the non-nullable BOOLEAN to nullable > >>>>> BOOLEAN, > >>>>>>> >> Calcite > >>>>>>> >>> can not split out equiv predicate, thus some join operation > >>>>> like hash > >>>>>>> >> join > >>>>>>> >>> / sort merge join may not be used. Maybe we can > >>>>>>> >>> expand RelOptUtil#splitJoinCondition to support this > >>>>>>> scenario? > >>>>>>> >> > >>>>>>> > > >>>>> > >>>>> > >>>
