Your email had two implicit questions:
1. Is current behavior correct?
2. What should the behavior be?

And my answers were: Current behavior is not correct. Desired behavior
is to take into account functional dependencies and pureness.

We do have a notion of pureness: SqlOperator.isDeterministic(). The
fix would use that in AggChecker or somewhere similar.

Julian




On Thu, May 5, 2022 at 12:56 AM Viliam Durina
<[email protected]> wrote:
>
> Exactly, that's the point. RAND() isn't pure. But Calcite doesn't know
> which functions are pure. I debugged the `AggChecker`, for function calls
> it only checks the operands - RAND() has none, so it assumes it's pure. But
> a function in general can depend on data other than its arguments. However
> from functions included in Calcite maybe only RAND is non-pure.
>
> I hit this because in my case, our user ordered the results using a complex
> expression. After looking at the plan I saw it evaluated this expression
> for each row after table scan, even though DISTINCT greatly reduced the
> number of rows. If the complex order-by expression was evaluated after
> duplicate removal, the query would be faster, even though in my case the
> result set was correct. In my case evaluating the order by expressions
> after aggregation would improve performance.
>
> I also tried this query:
> SELECT DISTINCT empno, deptno + 1
> FROM emp
> ORDER BY deptno + empno + 1
>
> It fails with: "Expression 'DEPTNO' is not in the select clause". So `ORDER
> BY deptno + 1 + empno` works, but `ORDER BY deptno + emptno + 1` doesn't,
> even though it's an equivalent expression. The code in AggChecker doesn't
> handle all cases.
>
> Viliam
>
> On Tue, 3 May 2022 at 22:33, Julian Hyde <[email protected]> wrote:
>
> > I think that Postgres is a little stricter than necessary. As long as
> > the operations are pure, the ORDER BY clause of a SELECT DISTINCT
> > query should allow any expression that is functionally dependent on
> > the expressions in the SELECT clause. In particular,
> >
> >   SELECT DISTINCT x, y
> >   FROM t
> >   ORDER BY x + y + 1
> >
> > should be valid because "x + y + 1" is functionally dependent.
> >
> > A pure parameterless function, such as CURRENT_DATE, is trivially
> > functionally dependent. Therefore
> >
> >   SELECT DISTINCT x
> >   FROM t
> >   ORDER BY CURRENT_DATE + x
> >
> > should also be valid.
> >
> > But RAND() isn't pure. There is an argument to ban non-pure and
> > non-deterministic expressions from the ORDER BY clause of a SELECT
> > DISTINCT query. But there's also an argument to allow them and ensure
> > that they are evaluated once per row. In which case the plan would be
> >
> >   LogicalProject(DEPTNO=[$0])
> >     LogicalSort(sort0=[$1], dir0=[ASC])
> >       LogicalProject(DEPTNO=[$0], EXPR$1=[RAND()])
> >         LogicalAggregate(group=[{7}])
> >           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> >
> > This plan is like the first plan, except that the LogicalProject
> > occurs after LogicalAggregate rather than before.
> >
> > Julian
> >
> > On Tue, May 3, 2022 at 7:32 AM Viliam Durina
> > <[email protected]> wrote:
> > >
> > > Hi all,
> > >
> > > I have this query:
> > >
> > > SELECT DISTINCT deptno
> > > FROM emp
> > > ORDER BY rand()
> > >
> > > Sql2rel conversion generates the following expression:
> > >
> > > LogicalProject(DEPTNO=[$0])
> > >   LogicalSort(sort0=[$1], dir0=[ASC])
> > >     LogicalAggregate(group=[{0, 1}])
> > >       LogicalProject(DEPTNO=[$7], EXPR$1=[RAND()])
> > >         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> > >
> > > I think it is not correct. The aggregation groups by `{deptno, RAND()}`
> > > tuple, which (almost) disables the DISTINCT clause. This plan would be
> > > correct:
> > >
> > > LogicalProject(DEPTNO=[$0])
> > >   LogicalSort(sort0=[$1], dir0=[ASC])
> > >     LogicalProject(DEPTNO=[$0], EXPR$0=RAND())
> > >       LogicalAggregate(group=[{0}])
> > >         LogicalProject(DEPTNO=[$7]])
> > >           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> > >
> > > PostgreSQL in this case throws "ERROR: for SELECT DISTINCT, ORDER BY
> > > expressions must appear in select list".
> > >
> > > On the other hand, this query:
> > >
> > > SELECT DISTINCT deptno
> > > FROM emp
> > > ORDER BY empno
> > >
> > > correctly throws the following exception: "Expression 'EMPNO' is not in
> > the
> > > select clause". I think this is the right thing to do.
> > >
> > > There's also `SqlToRelConverterTest.testOrderDistinct()` test, which also
> > > asserts a wrong plan IMO.
> > >
> > > Do you agree this is an issue?
> > >
> > > Viliam
> > >
> > > --
> > > This message contains confidential information and is intended only for
> > the
> > > individuals named. If you are not the named addressee you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately by e-mail if you have received this e-mail by mistake and
> > > delete this e-mail from your system. E-mail transmission cannot be
> > > guaranteed to be secure or error-free as information could be
> > intercepted,
> > > corrupted, lost, destroyed, arrive late or incomplete, or contain
> > viruses.
> > > The sender therefore does not accept liability for any errors or
> > omissions
> > > in the contents of this message, which arise as a result of e-mail
> > > transmission. If verification is required, please request a hard-copy
> > > version. -Hazelcast
> >
>
> --
> This message contains confidential information and is intended only for the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast

Reply via email to