I agree that expandLocalRef and expandList are broken. They should probably
throw if expansion creates multiple references to non-deterministic expressions.
Let’s be clear what we mean by non-deterministic. “RANDOM" and “NEXT VALUE OF
<sequence>” are non-deterministic. “CURRENT_TIMESTAMP” is deterministic.
There are uses of expandLocalRef and expandList that are probably safe:
1. When a RexProgram is copied, it expands the DAG of expressions, and then
performs common-subexpression to build a new DAG. Thus the duplicates are
re-combined by the end of the process.
2. It is valid to flatten
SELECT z
FROM (
SELECT random() - 1 AS x,
random() + 1 AS y,
3 AS z
FROM t)
to
SELECT 3 AS z
FROM t
even though there are common expressions.
3. Expanding common sub-expressions in
SELECT CASE
WHEN x > 10 THEN random() + 5
WHEN x > 5 THEN random() - 3
ELSE 0 END AS c
FROM t
is harmless because at most one of the will be used.
Please log a bug. The fix would deprecate the old functions, add new ones with
the revised behavior, and add tests that the new behavior doesn’t generate
spurious errors.
Julian
> On Feb 9, 2023, at 3:34 AM, Viliam Durina <[email protected]>
> wrote:
>
> Hello all,
>
> we're using `RexProgram.expandLocalRef()` and `RexProgram.expandList()` to
> have a simple RexNode which we later evaluate. However, if the program
> contains a non-deterministic function call in the common subexpressions,
> such call is inlined and can be called a different number of times as a
> result, leading to an incorrect result.
>
> This problem spreads to anywhere these methods are used, and it's used
> quite a many times in Calcite, and we've also hit it in our custom rule.
>
> It seems to me that this function is flawed by design and should be removed
> as it leads to correctness problems with non-deterministic functions. Or
> should be redesigned, because perhaps some uses are valid, for example the
> one to determine selectivity in `RelMdSelectivity`.
>
> What do others think about this?
>
> Viliam
>
> --
> This message contains confidential information and is intended only for the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast