zhuqi-lucas commented on code in PR #16943:
URL: https://github.com/apache/datafusion/pull/16943#discussion_r2235948057
##########
datafusion/sqllogictest/test_files/window.slt:
##########
@@ -5715,17 +5715,82 @@ EXPLAIN SELECT
RANGE BETWEEN INTERVAL '2 minutes' PRECEDING AND CURRENT ROW
) AS distinct_count
FROM table_test_distinct_count
-ODER BY k, time;
+ORDER BY k, time;
----
logical_plan
-01)Projection: oder.k, oder.time, count(oder.v) PARTITION BY [oder.k] ORDER BY
[oder.time ASC NULLS LAST] RANGE BETWEEN 2 minutes PRECEDING AND CURRENT ROW AS
normal_count, count(DISTINCT oder.v) PARTITION BY [oder.k] ORDER BY [oder.time
ASC NULLS LAST] RANGE BETWEEN 2 minutes PRECEDING AND CURRENT ROW AS
distinct_count
-02)--WindowAggr: windowExpr=[[count(oder.v) PARTITION BY [oder.k] ORDER BY
[oder.time ASC NULLS LAST] RANGE BETWEEN IntervalMonthDayNano { months: 0,
days: 0, nanoseconds: 120000000000 } PRECEDING AND CURRENT ROW AS count(oder.v)
PARTITION BY [oder.k] ORDER BY [oder.time ASC NULLS LAST] RANGE BETWEEN 2
minutes PRECEDING AND CURRENT ROW, count(DISTINCT oder.v) PARTITION BY [oder.k]
ORDER BY [oder.time ASC NULLS LAST] RANGE BETWEEN IntervalMonthDayNano {
months: 0, days: 0, nanoseconds: 120000000000 } PRECEDING AND CURRENT ROW AS
count(DISTINCT oder.v) PARTITION BY [oder.k] ORDER BY [oder.time ASC NULLS
LAST] RANGE BETWEEN 2 minutes PRECEDING AND CURRENT ROW]]
-03)----SubqueryAlias: oder
+01)Sort: table_test_distinct_count.k ASC NULLS LAST,
table_test_distinct_count.time ASC NULLS LAST
+02)--Projection: table_test_distinct_count.k, table_test_distinct_count.time,
count(table_test_distinct_count.v) PARTITION BY [table_test_distinct_count.k]
ORDER BY [table_test_distinct_count.time ASC NULLS LAST] RANGE BETWEEN 2
minutes PRECEDING AND CURRENT ROW AS normal_count, count(DISTINCT
table_test_distinct_count.v) PARTITION BY [table_test_distinct_count.k] ORDER
BY [table_test_distinct_count.time ASC NULLS LAST] RANGE BETWEEN 2 minutes
PRECEDING AND CURRENT ROW AS distinct_count
+03)----WindowAggr: windowExpr=[[count(table_test_distinct_count.v) PARTITION
BY [table_test_distinct_count.k] ORDER BY [table_test_distinct_count.time ASC
NULLS LAST] RANGE BETWEEN IntervalMonthDayNano { months: 0, days: 0,
nanoseconds: 120000000000 } PRECEDING AND CURRENT ROW AS
count(table_test_distinct_count.v) PARTITION BY [table_test_distinct_count.k]
ORDER BY [table_test_distinct_count.time ASC NULLS LAST] RANGE BETWEEN 2
minutes PRECEDING AND CURRENT ROW, count(DISTINCT table_test_distinct_count.v)
PARTITION BY [table_test_distinct_count.k] ORDER BY
[table_test_distinct_count.time ASC NULLS LAST] RANGE BETWEEN
IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 120000000000 }
PRECEDING AND CURRENT ROW AS count(DISTINCT table_test_distinct_count.v)
PARTITION BY [table_test_distinct_count.k] ORDER BY
[table_test_distinct_count.time ASC NULLS LAST] RANGE BETWEEN 2 minutes
PRECEDING AND CURRENT ROW]]
04)------TableScan: table_test_distinct_count projection=[k, v, time]
physical_plan
-01)ProjectionExec: expr=[k@0 as k, time@2 as time, count(oder.v) PARTITION BY
[oder.k] ORDER BY [oder.time ASC NULLS LAST] RANGE BETWEEN 2 minutes PRECEDING
AND CURRENT ROW@3 as normal_count, count(DISTINCT oder.v) PARTITION BY [oder.k]
ORDER BY [oder.time ASC NULLS LAST] RANGE BETWEEN 2 minutes PRECEDING AND
CURRENT ROW@4 as distinct_count]
-02)--BoundedWindowAggExec: wdw=[count(oder.v) PARTITION BY [oder.k] ORDER BY
[oder.time ASC NULLS LAST] RANGE BETWEEN 2 minutes PRECEDING AND CURRENT ROW:
Field { name: "count(oder.v) PARTITION BY [oder.k] ORDER BY [oder.time ASC
NULLS LAST] RANGE BETWEEN 2 minutes PRECEDING AND CURRENT ROW", data_type:
Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} },
frame: RANGE BETWEEN IntervalMonthDayNano { months: 0, days: 0, nanoseconds:
120000000000 } PRECEDING AND CURRENT ROW, count(DISTINCT oder.v) PARTITION BY
[oder.k] ORDER BY [oder.time ASC NULLS LAST] RANGE BETWEEN 2 minutes PRECEDING
AND CURRENT ROW: Field { name: "count(DISTINCT oder.v) PARTITION BY [oder.k]
ORDER BY [oder.time ASC NULLS LAST] RANGE BETWEEN 2 minutes PRECEDING AND
CURRENT ROW", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered:
false, metadata: {} }, frame: RANGE BETWEEN IntervalMonthDayNano { months: 0,
days: 0, nanoseconds: 120000000000 } PRECEDING AND CURRENT ROW], mode=[Sort
ed]
-03)----SortExec: expr=[k@0 ASC NULLS LAST, time@2 ASC NULLS LAST],
preserve_partitioning=[true]
-04)------CoalesceBatchesExec: target_batch_size=1
-05)--------RepartitionExec: partitioning=Hash([k@0], 2), input_partitions=2
-06)----------DataSourceExec: partitions=2, partition_sizes=[5, 4]
+01)SortPreservingMergeExec: [k@0 ASC NULLS LAST, time@1 ASC NULLS LAST]
+02)--ProjectionExec: expr=[k@0 as k, time@2 as time,
count(table_test_distinct_count.v) PARTITION BY [table_test_distinct_count.k]
ORDER BY [table_test_distinct_count.time ASC NULLS LAST] RANGE BETWEEN 2
minutes PRECEDING AND CURRENT ROW@3 as normal_count, count(DISTINCT
table_test_distinct_count.v) PARTITION BY [table_test_distinct_count.k] ORDER
BY [table_test_distinct_count.time ASC NULLS LAST] RANGE BETWEEN 2 minutes
PRECEDING AND CURRENT ROW@4 as distinct_count]
+03)----BoundedWindowAggExec: wdw=[count(table_test_distinct_count.v) PARTITION
BY [table_test_distinct_count.k] ORDER BY [table_test_distinct_count.time ASC
NULLS LAST] RANGE BETWEEN 2 minutes PRECEDING AND CURRENT ROW: Field { name:
"count(table_test_distinct_count.v) PARTITION BY [table_test_distinct_count.k]
ORDER BY [table_test_distinct_count.time ASC NULLS LAST] RANGE BETWEEN 2
minutes PRECEDING AND CURRENT ROW", data_type: Int64, nullable: false, dict_id:
0, dict_is_ordered: false, metadata: {} }, frame: RANGE BETWEEN
IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 120000000000 }
PRECEDING AND CURRENT ROW, count(DISTINCT table_test_distinct_count.v)
PARTITION BY [table_test_distinct_count.k] ORDER BY
[table_test_distinct_count.time ASC NULLS LAST] RANGE BETWEEN 2 minutes
PRECEDING AND CURRENT ROW: Field { name: "count(DISTINCT
table_test_distinct_count.v) PARTITION BY [table_test_distinct_count.k] ORDER
BY [table_test_distinct_count.time ASC NULLS LAST] RANGE BETWEEN 2
minutes PRECEDING AND CURRENT ROW", data_type: Int64, nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: RANGE BETWEEN
IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 120000000000 }
PRECEDING AND CURRENT ROW], mode=[Sorted]
+04)------SortExec: expr=[k@0 ASC NULLS LAST, time@2 ASC NULLS LAST],
preserve_partitioning=[true]
+05)--------CoalesceBatchesExec: target_batch_size=1
+06)----------RepartitionExec: partitioning=Hash([k@0], 2), input_partitions=2
+07)------------DataSourceExec: partitions=2, partition_sizes=[5, 4]
+
+
+# Add testing for distinct sum
Review Comment:
This is the corresponding slt testing for this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]