gruuya commented on code in PR #7981:
URL: https://github.com/apache/arrow-datafusion/pull/7981#discussion_r1381195039
##########
datafusion/optimizer/src/replace_distinct_aggregate.rs:
##########
@@ -32,6 +36,22 @@ use datafusion_expr::{Aggregate, Distinct, LogicalPlan};
/// ```text
/// SELECT a, b FROM tab GROUP BY a, b
/// ```
+///
+/// On the other hand, for a `DISTINCT ON` query the replacement is
+/// a bit more involved and effectively converts
+/// ```text
+/// SELECT DISTINCT ON (a) b FROM tab ORDER BY a DESC, c
+/// ```
+///
+/// into
+/// ```text
+/// SELECT b FROM (
+/// SELECT a, FIRST_VALUE(b ORDER BY a DESC, c) AS b
Review Comment:
Actually there is no partitioning involved in the transformed plans. In fact
the query from docs returns the same results as the `DISTINCT ON` query:
```sql
❯ select c2 from (
select c1, first_value(c2 order by c1, c3) as c2
from test group by c1
)
order by c1;
+----+
| c2 |
+----+
| 5 |
| 4 |
| 2 |
| 1 |
| 3 |
+----+
5 rows in set. Query took 0.035 seconds.
❯ select distinct on (c1) c2 from test order by c1, c3;
+---------+
| test.c2 |
+---------+
| 5 |
| 4 |
| 2 |
| 1 |
| 3 |
+---------+
5 rows in set. Query took 0.034 seconds
```
##########
datafusion/optimizer/src/replace_distinct_aggregate.rs:
##########
@@ -32,6 +36,22 @@ use datafusion_expr::{Aggregate, Distinct, LogicalPlan};
/// ```text
/// SELECT a, b FROM tab GROUP BY a, b
/// ```
+///
+/// On the other hand, for a `DISTINCT ON` query the replacement is
+/// a bit more involved and effectively converts
+/// ```text
+/// SELECT DISTINCT ON (a) b FROM tab ORDER BY a DESC, c
+/// ```
+///
+/// into
+/// ```text
+/// SELECT b FROM (
+/// SELECT a, FIRST_VALUE(b ORDER BY a DESC, c) AS b
Review Comment:
Actually there is no partitioning involved in the transformed plans. In fact
the query from docs returns the same results as the `DISTINCT ON` query:
```sql
❯ select c2 from (
select c1, first_value(c2 order by c1, c3) as c2
from test group by c1
)
order by c1;
+----+
| c2 |
+----+
| 5 |
| 4 |
| 2 |
| 1 |
| 3 |
+----+
5 rows in set. Query took 0.035 seconds.
❯ select distinct on (c1) c2 from test order by c1, c3;
+---------+
| test.c2 |
+---------+
| 5 |
| 4 |
| 2 |
| 1 |
| 3 |
+---------+
5 rows in set. Query took 0.034 seconds
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]