PS Please log a JIRA case

On Thu, May 5, 2022 at 10:41 PM Julian Hyde <[email protected]> wrote:
>
> Your email had two implicit questions:
> 1. Is current behavior correct?
> 2. What should the behavior be?
>
> And my answers were: Current behavior is not correct. Desired behavior
> is to take into account functional dependencies and pureness.
>
> We do have a notion of pureness: SqlOperator.isDeterministic(). The
> fix would use that in AggChecker or somewhere similar.
>
> Julian
>
>
>
>
> On Thu, May 5, 2022 at 12:56 AM Viliam Durina
> <[email protected]> wrote:
> >
> > Exactly, that's the point. RAND() isn't pure. But Calcite doesn't know
> > which functions are pure. I debugged the `AggChecker`, for function calls
> > it only checks the operands - RAND() has none, so it assumes it's pure. But
> > a function in general can depend on data other than its arguments. However
> > from functions included in Calcite maybe only RAND is non-pure.
> >
> > I hit this because in my case, our user ordered the results using a complex
> > expression. After looking at the plan I saw it evaluated this expression
> > for each row after table scan, even though DISTINCT greatly reduced the
> > number of rows. If the complex order-by expression was evaluated after
> > duplicate removal, the query would be faster, even though in my case the
> > result set was correct. In my case evaluating the order by expressions
> > after aggregation would improve performance.
> >
> > I also tried this query:
> > SELECT DISTINCT empno, deptno + 1
> > FROM emp
> > ORDER BY deptno + empno + 1
> >
> > It fails with: "Expression 'DEPTNO' is not in the select clause". So `ORDER
> > BY deptno + 1 + empno` works, but `ORDER BY deptno + emptno + 1` doesn't,
> > even though it's an equivalent expression. The code in AggChecker doesn't
> > handle all cases.
> >
> > Viliam
> >
> > On Tue, 3 May 2022 at 22:33, Julian Hyde <[email protected]> wrote:
> >
> > > I think that Postgres is a little stricter than necessary. As long as
> > > the operations are pure, the ORDER BY clause of a SELECT DISTINCT
> > > query should allow any expression that is functionally dependent on
> > > the expressions in the SELECT clause. In particular,
> > >
> > >   SELECT DISTINCT x, y
> > >   FROM t
> > >   ORDER BY x + y + 1
> > >
> > > should be valid because "x + y + 1" is functionally dependent.
> > >
> > > A pure parameterless function, such as CURRENT_DATE, is trivially
> > > functionally dependent. Therefore
> > >
> > >   SELECT DISTINCT x
> > >   FROM t
> > >   ORDER BY CURRENT_DATE + x
> > >
> > > should also be valid.
> > >
> > > But RAND() isn't pure. There is an argument to ban non-pure and
> > > non-deterministic expressions from the ORDER BY clause of a SELECT
> > > DISTINCT query. But there's also an argument to allow them and ensure
> > > that they are evaluated once per row. In which case the plan would be
> > >
> > >   LogicalProject(DEPTNO=[$0])
> > >     LogicalSort(sort0=[$1], dir0=[ASC])
> > >       LogicalProject(DEPTNO=[$0], EXPR$1=[RAND()])
> > >         LogicalAggregate(group=[{7}])
> > >           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> > >
> > > This plan is like the first plan, except that the LogicalProject
> > > occurs after LogicalAggregate rather than before.
> > >
> > > Julian
> > >
> > > On Tue, May 3, 2022 at 7:32 AM Viliam Durina
> > > <[email protected]> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I have this query:
> > > >
> > > > SELECT DISTINCT deptno
> > > > FROM emp
> > > > ORDER BY rand()
> > > >
> > > > Sql2rel conversion generates the following expression:
> > > >
> > > > LogicalProject(DEPTNO=[$0])
> > > >   LogicalSort(sort0=[$1], dir0=[ASC])
> > > >     LogicalAggregate(group=[{0, 1}])
> > > >       LogicalProject(DEPTNO=[$7], EXPR$1=[RAND()])
> > > >         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> > > >
> > > > I think it is not correct. The aggregation groups by `{deptno, RAND()}`
> > > > tuple, which (almost) disables the DISTINCT clause. This plan would be
> > > > correct:
> > > >
> > > > LogicalProject(DEPTNO=[$0])
> > > >   LogicalSort(sort0=[$1], dir0=[ASC])
> > > >     LogicalProject(DEPTNO=[$0], EXPR$0=RAND())
> > > >       LogicalAggregate(group=[{0}])
> > > >         LogicalProject(DEPTNO=[$7]])
> > > >           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> > > >
> > > > PostgreSQL in this case throws "ERROR: for SELECT DISTINCT, ORDER BY
> > > > expressions must appear in select list".
> > > >
> > > > On the other hand, this query:
> > > >
> > > > SELECT DISTINCT deptno
> > > > FROM emp
> > > > ORDER BY empno
> > > >
> > > > correctly throws the following exception: "Expression 'EMPNO' is not in
> > > the
> > > > select clause". I think this is the right thing to do.
> > > >
> > > > There's also `SqlToRelConverterTest.testOrderDistinct()` test, which 
> > > > also
> > > > asserts a wrong plan IMO.
> > > >
> > > > Do you agree this is an issue?
> > > >
> > > > Viliam
> > > >
> > > > --
> > > > This message contains confidential information and is intended only for
> > > the
> > > > individuals named. If you are not the named addressee you should not
> > > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > > immediately by e-mail if you have received this e-mail by mistake and
> > > > delete this e-mail from your system. E-mail transmission cannot be
> > > > guaranteed to be secure or error-free as information could be
> > > intercepted,
> > > > corrupted, lost, destroyed, arrive late or incomplete, or contain
> > > viruses.
> > > > The sender therefore does not accept liability for any errors or
> > > omissions
> > > > in the contents of this message, which arise as a result of e-mail
> > > > transmission. If verification is required, please request a hard-copy
> > > > version. -Hazelcast
> > >
> >
> > --
> > This message contains confidential information and is intended only for the
> > individuals named. If you are not the named addressee you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately by e-mail if you have received this e-mail by mistake and
> > delete this e-mail from your system. E-mail transmission cannot be
> > guaranteed to be secure or error-free as information could be intercepted,
> > corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> > The sender therefore does not accept liability for any errors or omissions
> > in the contents of this message, which arise as a result of e-mail
> > transmission. If verification is required, please request a hard-copy
> > version. -Hazelcast

Reply via email to