PS Please log a JIRA case
On Thu, May 5, 2022 at 10:41 PM Julian Hyde <[email protected]> wrote: > > Your email had two implicit questions: > 1. Is current behavior correct? > 2. What should the behavior be? > > And my answers were: Current behavior is not correct. Desired behavior > is to take into account functional dependencies and pureness. > > We do have a notion of pureness: SqlOperator.isDeterministic(). The > fix would use that in AggChecker or somewhere similar. > > Julian > > > > > On Thu, May 5, 2022 at 12:56 AM Viliam Durina > <[email protected]> wrote: > > > > Exactly, that's the point. RAND() isn't pure. But Calcite doesn't know > > which functions are pure. I debugged the `AggChecker`, for function calls > > it only checks the operands - RAND() has none, so it assumes it's pure. But > > a function in general can depend on data other than its arguments. However > > from functions included in Calcite maybe only RAND is non-pure. > > > > I hit this because in my case, our user ordered the results using a complex > > expression. After looking at the plan I saw it evaluated this expression > > for each row after table scan, even though DISTINCT greatly reduced the > > number of rows. If the complex order-by expression was evaluated after > > duplicate removal, the query would be faster, even though in my case the > > result set was correct. In my case evaluating the order by expressions > > after aggregation would improve performance. > > > > I also tried this query: > > SELECT DISTINCT empno, deptno + 1 > > FROM emp > > ORDER BY deptno + empno + 1 > > > > It fails with: "Expression 'DEPTNO' is not in the select clause". So `ORDER > > BY deptno + 1 + empno` works, but `ORDER BY deptno + emptno + 1` doesn't, > > even though it's an equivalent expression. The code in AggChecker doesn't > > handle all cases. > > > > Viliam > > > > On Tue, 3 May 2022 at 22:33, Julian Hyde <[email protected]> wrote: > > > > > I think that Postgres is a little stricter than necessary. As long as > > > the operations are pure, the ORDER BY clause of a SELECT DISTINCT > > > query should allow any expression that is functionally dependent on > > > the expressions in the SELECT clause. In particular, > > > > > > SELECT DISTINCT x, y > > > FROM t > > > ORDER BY x + y + 1 > > > > > > should be valid because "x + y + 1" is functionally dependent. > > > > > > A pure parameterless function, such as CURRENT_DATE, is trivially > > > functionally dependent. Therefore > > > > > > SELECT DISTINCT x > > > FROM t > > > ORDER BY CURRENT_DATE + x > > > > > > should also be valid. > > > > > > But RAND() isn't pure. There is an argument to ban non-pure and > > > non-deterministic expressions from the ORDER BY clause of a SELECT > > > DISTINCT query. But there's also an argument to allow them and ensure > > > that they are evaluated once per row. In which case the plan would be > > > > > > LogicalProject(DEPTNO=[$0]) > > > LogicalSort(sort0=[$1], dir0=[ASC]) > > > LogicalProject(DEPTNO=[$0], EXPR$1=[RAND()]) > > > LogicalAggregate(group=[{7}]) > > > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > > > > > > This plan is like the first plan, except that the LogicalProject > > > occurs after LogicalAggregate rather than before. > > > > > > Julian > > > > > > On Tue, May 3, 2022 at 7:32 AM Viliam Durina > > > <[email protected]> wrote: > > > > > > > > Hi all, > > > > > > > > I have this query: > > > > > > > > SELECT DISTINCT deptno > > > > FROM emp > > > > ORDER BY rand() > > > > > > > > Sql2rel conversion generates the following expression: > > > > > > > > LogicalProject(DEPTNO=[$0]) > > > > LogicalSort(sort0=[$1], dir0=[ASC]) > > > > LogicalAggregate(group=[{0, 1}]) > > > > LogicalProject(DEPTNO=[$7], EXPR$1=[RAND()]) > > > > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > > > > > > > > I think it is not correct. The aggregation groups by `{deptno, RAND()}` > > > > tuple, which (almost) disables the DISTINCT clause. This plan would be > > > > correct: > > > > > > > > LogicalProject(DEPTNO=[$0]) > > > > LogicalSort(sort0=[$1], dir0=[ASC]) > > > > LogicalProject(DEPTNO=[$0], EXPR$0=RAND()) > > > > LogicalAggregate(group=[{0}]) > > > > LogicalProject(DEPTNO=[$7]]) > > > > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > > > > > > > > PostgreSQL in this case throws "ERROR: for SELECT DISTINCT, ORDER BY > > > > expressions must appear in select list". > > > > > > > > On the other hand, this query: > > > > > > > > SELECT DISTINCT deptno > > > > FROM emp > > > > ORDER BY empno > > > > > > > > correctly throws the following exception: "Expression 'EMPNO' is not in > > > the > > > > select clause". I think this is the right thing to do. > > > > > > > > There's also `SqlToRelConverterTest.testOrderDistinct()` test, which > > > > also > > > > asserts a wrong plan IMO. > > > > > > > > Do you agree this is an issue? > > > > > > > > Viliam > > > > > > > > -- > > > > This message contains confidential information and is intended only for > > > the > > > > individuals named. If you are not the named addressee you should not > > > > disseminate, distribute or copy this e-mail. Please notify the sender > > > > immediately by e-mail if you have received this e-mail by mistake and > > > > delete this e-mail from your system. E-mail transmission cannot be > > > > guaranteed to be secure or error-free as information could be > > > intercepted, > > > > corrupted, lost, destroyed, arrive late or incomplete, or contain > > > viruses. > > > > The sender therefore does not accept liability for any errors or > > > omissions > > > > in the contents of this message, which arise as a result of e-mail > > > > transmission. If verification is required, please request a hard-copy > > > > version. -Hazelcast > > > > > > > -- > > This message contains confidential information and is intended only for the > > individuals named. If you are not the named addressee you should not > > disseminate, distribute or copy this e-mail. Please notify the sender > > immediately by e-mail if you have received this e-mail by mistake and > > delete this e-mail from your system. E-mail transmission cannot be > > guaranteed to be secure or error-free as information could be intercepted, > > corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. > > The sender therefore does not accept liability for any errors or omissions > > in the contents of this message, which arise as a result of e-mail > > transmission. If verification is required, please request a hard-copy > > version. -Hazelcast
