Rex is not SQL. The SQL standard does not have a say in what is valid Rex. 
(Clearly we have to comply with the SQL standard, but we change the Rex that we 
generate for this.)

We need some guarantee of ordering, for example the case you cite, where we 
check whether a reference is null before we reference it. AND may or may not be 
the operator that guarantees ordering. CASE retains ordering (meaning that you 
have to evaluate the condition before the branch) but I am not convinced that 
we need to keep ordering in AND and OR.

Whether RexSimplify should aggressively canonize the order or preserve order at 
all costs are different topics. I am actually against both.

Related to the ordering question is the question of the number of evaluations. 
If I write ‘random() * 2 < random() < 3’, is it guaranteed to execute 
‘random()’ at most once? Precisely once? I think the Rex language could use a 
concept like single assignment, like ‘let r = random() in (r * 2, r * 3) end’, 
which ensures that ‘random()’ is called exactly once and is called before the 
expressions ‘r * 2’ and ‘r * 3’ are evaluated.

This week, via a twitter exchange with Torsten Grust [1] I came across the 
paper “SSA is Functional Programming (Appel, 1998)” [2]. I could see Rex 
evolving towards SSA.

Julian

[1] https://twitter.com/Teggy/status/1234935448310603777 
<https://twitter.com/Teggy/status/1234935448310603777>

[2] https://www.cs.princeton.edu/~appel/papers/ssafun.pdf 
<https://www.cs.princeton.edu/~appel/papers/ssafun.pdf> 

> On Mar 5, 2020, at 3:14 AM, Chunwei Lei <[email protected]> wrote:
> 
> Currently, RexSimplify would decompose and compose the AND expression,
> which particularly puts
> IS NOT NULL and NOT predicates at the end of the AND expression[1][2]. For
> instance,
> `$1 is not null and $2=1` will be changed to `$2=1 and $1 is not null`
> after being simplified.
> 
> I know it is not a bug because the SQL standard[3] does not say the
> expression order should be retained.
> But I am wondering whether we can improve a little bit, namely, users can
> decide whether RexSimplify
> retains the order of the predicates or not. Because in my humble opinion,
> changing the order of these
> predicates might lead to two disadvantages:
> 
> 1) it might break some queries, especially those which contain udf.
> Assuming we have a udf called udf1 which throws an exception when meeting
> NULL operand.
> For query `select a is not null and udf1(a);`, it can run successfully
> because of short-circuiting.
> But after being simplified it will fail because `udf(a)` is executed before
> 'a is not null'.
> 
> 2) it might bring extra overhead.
> For instance, for `a is not null and heavey_udf(a)!='1'`, if we change the
> order,
> `heavey_udf(a)` will be executed even when a is null which might lead to
> extra overhead.
> 
> There are some discussions about this topic[4]. Unfortunately, we do not
> reach a consensus.
> What do you think about it? Would appreciate your feedback.
> 
> 
> [1]
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1529
> [2]
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1552
> [3]
> https://standards.iso.org/ittf/PubliclyAvailableStandards/c053681_ISO_IEC_9075-1_2011.zip
> [4] https://issues.apache.org/jira/browse/CALCITE-3746

Reply via email to