I think that Postgres is a little stricter than necessary. As long as
the operations are pure, the ORDER BY clause of a SELECT DISTINCT
query should allow any expression that is functionally dependent on
the expressions in the SELECT clause. In particular,
SELECT DISTINCT x, y
FROM t
ORDER BY x + y + 1
should be valid because "x + y + 1" is functionally dependent.
A pure parameterless function, such as CURRENT_DATE, is trivially
functionally dependent. Therefore
SELECT DISTINCT x
FROM t
ORDER BY CURRENT_DATE + x
should also be valid.
But RAND() isn't pure. There is an argument to ban non-pure and
non-deterministic expressions from the ORDER BY clause of a SELECT
DISTINCT query. But there's also an argument to allow them and ensure
that they are evaluated once per row. In which case the plan would be
LogicalProject(DEPTNO=[$0])
LogicalSort(sort0=[$1], dir0=[ASC])
LogicalProject(DEPTNO=[$0], EXPR$1=[RAND()])
LogicalAggregate(group=[{7}])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
This plan is like the first plan, except that the LogicalProject
occurs after LogicalAggregate rather than before.
Julian
On Tue, May 3, 2022 at 7:32 AM Viliam Durina
<[email protected]> wrote:
>
> Hi all,
>
> I have this query:
>
> SELECT DISTINCT deptno
> FROM emp
> ORDER BY rand()
>
> Sql2rel conversion generates the following expression:
>
> LogicalProject(DEPTNO=[$0])
> LogicalSort(sort0=[$1], dir0=[ASC])
> LogicalAggregate(group=[{0, 1}])
> LogicalProject(DEPTNO=[$7], EXPR$1=[RAND()])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>
> I think it is not correct. The aggregation groups by `{deptno, RAND()}`
> tuple, which (almost) disables the DISTINCT clause. This plan would be
> correct:
>
> LogicalProject(DEPTNO=[$0])
> LogicalSort(sort0=[$1], dir0=[ASC])
> LogicalProject(DEPTNO=[$0], EXPR$0=RAND())
> LogicalAggregate(group=[{0}])
> LogicalProject(DEPTNO=[$7]])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>
> PostgreSQL in this case throws "ERROR: for SELECT DISTINCT, ORDER BY
> expressions must appear in select list".
>
> On the other hand, this query:
>
> SELECT DISTINCT deptno
> FROM emp
> ORDER BY empno
>
> correctly throws the following exception: "Expression 'EMPNO' is not in the
> select clause". I think this is the right thing to do.
>
> There's also `SqlToRelConverterTest.testOrderDistinct()` test, which also
> asserts a wrong plan IMO.
>
> Do you agree this is an issue?
>
> Viliam
>
> --
> This message contains confidential information and is intended only for the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast