I think that Postgres is a little stricter than necessary. As long as
the operations are pure, the ORDER BY clause of a SELECT DISTINCT
query should allow any expression that is functionally dependent on
the expressions in the SELECT clause. In particular,

  SELECT DISTINCT x, y
  FROM t
  ORDER BY x + y + 1

should be valid because "x + y + 1" is functionally dependent.

A pure parameterless function, such as CURRENT_DATE, is trivially
functionally dependent. Therefore

  SELECT DISTINCT x
  FROM t
  ORDER BY CURRENT_DATE + x

should also be valid.

But RAND() isn't pure. There is an argument to ban non-pure and
non-deterministic expressions from the ORDER BY clause of a SELECT
DISTINCT query. But there's also an argument to allow them and ensure
that they are evaluated once per row. In which case the plan would be

  LogicalProject(DEPTNO=[$0])
    LogicalSort(sort0=[$1], dir0=[ASC])
      LogicalProject(DEPTNO=[$0], EXPR$1=[RAND()])
        LogicalAggregate(group=[{7}])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]])

This plan is like the first plan, except that the LogicalProject
occurs after LogicalAggregate rather than before.

Julian

On Tue, May 3, 2022 at 7:32 AM Viliam Durina
<[email protected]> wrote:
>
> Hi all,
>
> I have this query:
>
> SELECT DISTINCT deptno
> FROM emp
> ORDER BY rand()
>
> Sql2rel conversion generates the following expression:
>
> LogicalProject(DEPTNO=[$0])
>   LogicalSort(sort0=[$1], dir0=[ASC])
>     LogicalAggregate(group=[{0, 1}])
>       LogicalProject(DEPTNO=[$7], EXPR$1=[RAND()])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>
> I think it is not correct. The aggregation groups by `{deptno, RAND()}`
> tuple, which (almost) disables the DISTINCT clause. This plan would be
> correct:
>
> LogicalProject(DEPTNO=[$0])
>   LogicalSort(sort0=[$1], dir0=[ASC])
>     LogicalProject(DEPTNO=[$0], EXPR$0=RAND())
>       LogicalAggregate(group=[{0}])
>         LogicalProject(DEPTNO=[$7]])
>           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>
> PostgreSQL in this case throws "ERROR: for SELECT DISTINCT, ORDER BY
> expressions must appear in select list".
>
> On the other hand, this query:
>
> SELECT DISTINCT deptno
> FROM emp
> ORDER BY empno
>
> correctly throws the following exception: "Expression 'EMPNO' is not in the
> select clause". I think this is the right thing to do.
>
> There's also `SqlToRelConverterTest.testOrderDistinct()` test, which also
> asserts a wrong plan IMO.
>
> Do you agree this is an issue?
>
> Viliam
>
> --
> This message contains confidential information and is intended only for the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast

Reply via email to