Currently, RexSimplify would decompose and compose the AND expression,
which particularly puts
IS NOT NULL and NOT predicates at the end of the AND expression[1][2]. For
instance,
`$1 is not null and $2=1` will be changed to `$2=1 and $1 is not null`
after being simplified.

I know it is not a bug because the SQL standard[3] does not say the
expression order should be retained.
But I am wondering whether we can improve a little bit, namely, users can
decide whether RexSimplify
retains the order of the predicates or not. Because in my humble opinion,
changing the order of these
predicates might lead to two disadvantages:

1) it might break some queries, especially those which contain udf.
Assuming we have a udf called udf1 which throws an exception when meeting
NULL operand.
For query `select a is not null and udf1(a);`, it can run successfully
because of short-circuiting.
But after being simplified it will fail because `udf(a)` is executed before
'a is not null'.

2) it might bring extra overhead.
For instance, for `a is not null and heavey_udf(a)!='1'`, if we change the
order,
`heavey_udf(a)` will be executed even when a is null which might lead to
extra overhead.

There are some discussions about this topic[4]. Unfortunately, we do not
reach a consensus.
What do you think about it? Would appreciate your feedback.


[1]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1529
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1552
[3]
https://standards.iso.org/ittf/PubliclyAvailableStandards/c053681_ISO_IEC_9075-1_2011.zip
[4] https://issues.apache.org/jira/browse/CALCITE-3746

Reply via email to