In your example with random(), I would expect it to execute twice because random is not deterministic.
On Thu, Mar 5, 2020, 13:15 Julian Hyde <[email protected]> wrote: > Rex is not SQL. The SQL standard does not have a say in what is valid Rex. > (Clearly we have to comply with the SQL standard, but we change the Rex > that we generate for this.) > > We need some guarantee of ordering, for example the case you cite, where > we check whether a reference is null before we reference it. AND may or may > not be the operator that guarantees ordering. CASE retains ordering > (meaning that you have to evaluate the condition before the branch) but I > am not convinced that we need to keep ordering in AND and OR. > > Whether RexSimplify should aggressively canonize the order or preserve > order at all costs are different topics. I am actually against both. > > Related to the ordering question is the question of the number of > evaluations. If I write ‘random() * 2 < random() < 3’, is it guaranteed to > execute ‘random()’ at most once? Precisely once? I think the Rex language > could use a concept like single assignment, like ‘let r = random() in (r * > 2, r * 3) end’, which ensures that ‘random()’ is called exactly once and is > called before the expressions ‘r * 2’ and ‘r * 3’ are evaluated. > > This week, via a twitter exchange with Torsten Grust [1] I came across the > paper “SSA is Functional Programming (Appel, 1998)” [2]. I could see Rex > evolving towards SSA. > > Julian > > [1] https://twitter.com/Teggy/status/1234935448310603777 < > https://twitter.com/Teggy/status/1234935448310603777> > > [2] https://www.cs.princeton.edu/~appel/papers/ssafun.pdf < > https://www.cs.princeton.edu/~appel/papers/ssafun.pdf> > > > On Mar 5, 2020, at 3:14 AM, Chunwei Lei <[email protected]> wrote: > > > > Currently, RexSimplify would decompose and compose the AND expression, > > which particularly puts > > IS NOT NULL and NOT predicates at the end of the AND expression[1][2]. > For > > instance, > > `$1 is not null and $2=1` will be changed to `$2=1 and $1 is not null` > > after being simplified. > > > > I know it is not a bug because the SQL standard[3] does not say the > > expression order should be retained. > > But I am wondering whether we can improve a little bit, namely, users can > > decide whether RexSimplify > > retains the order of the predicates or not. Because in my humble opinion, > > changing the order of these > > predicates might lead to two disadvantages: > > > > 1) it might break some queries, especially those which contain udf. > > Assuming we have a udf called udf1 which throws an exception when meeting > > NULL operand. > > For query `select a is not null and udf1(a);`, it can run successfully > > because of short-circuiting. > > But after being simplified it will fail because `udf(a)` is executed > before > > 'a is not null'. > > > > 2) it might bring extra overhead. > > For instance, for `a is not null and heavey_udf(a)!='1'`, if we change > the > > order, > > `heavey_udf(a)` will be executed even when a is null which might lead to > > extra overhead. > > > > There are some discussions about this topic[4]. Unfortunately, we do not > > reach a consensus. > > What do you think about it? Would appreciate your feedback. > > > > > > [1] > > > https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1529 > > [2] > > > https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1552 > > [3] > > > https://standards.iso.org/ittf/PubliclyAvailableStandards/c053681_ISO_IEC_9075-1_2011.zip > > [4] https://issues.apache.org/jira/browse/CALCITE-3746 > >
