mustafasrepo commented on code in PR #9435:
URL: https://github.com/apache/arrow-datafusion/pull/9435#discussion_r1513955386
##########
datafusion/physical-expr/src/equivalence/projection.rs:
##########
@@ -761,10 +735,7 @@ mod tests {
// expected
vec![
// [a_new ASC, date_bin_res ASC]
- // Please note that result is not [a_new ASC, date_bin_res
ASC, b_new ASC]
Review Comment:
I double checked whether this functionality is covered by the `.slt` tests.
It seems that, `.slt` tests covers cases in the form where `date_bin` argument
(such as `ts`) is leading ordering (e.g first ordering in lexicographical
ordering expression). Then after `date_bin` is applied, `date_bin` result
itself is ordered. In this sense, `monotonicity` behaviour is covered.
However, this unit test, tests when table is ordered as `[a ASC, ts
ASC]`,after projection `a, date_bin(ts)` result should have ordering: `[a,
date_bin(ts)]`. Meaning, when `date_bin `argument is not leading ordering,
lexicographical ordering is still preserved. As long as previous expressions
are present in the projection (in the example `a`).
I checked whether this behaviour is preserved in this PR. By running
following test
```
# create an unbounded table that contains name, timestamp.
# where table is ordered by name DESC, ts DESC
statement ok
CREATE UNBOUNDED EXTERNAL TABLE unbounded_csv_with_timestamps2 (
name VARCHAR,
ts TIMESTAMP
)
STORED AS CSV
WITH ORDER (name DESC, ts DESC)
LOCATION '../core/tests/data/timestamps.csv'
query TT
EXPLAIN SELECT name, date_bin('15 minutes', ts) as time_chunks
FROM unbounded_csv_with_timestamps2
ORDER BY name DESC, time_chunks DESC
LIMIT 5;
----
logical_plan
Limit: skip=0, fetch=5
--Sort: unbounded_csv_with_timestamps2.name DESC NULLS FIRST, time_chunks
DESC NULLS FIRST, fetch=5
----Projection: unbounded_csv_with_timestamps2.name,
date_bin(IntervalMonthDayNano("900000000000"),
unbounded_csv_with_timestamps2.ts) AS time_chunks
------TableScan: unbounded_csv_with_timestamps2 projection=[name, ts]
physical_plan
GlobalLimitExec: skip=0, fetch=5
--SortPreservingMergeExec: [name@0 DESC,time_chunks@1 DESC], fetch=5
----ProjectionExec: expr=[name@0 as name, date_bin(900000000000, ts@1) as
time_chunks]
------RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1
--------StreamingTableExec: partition_sizes=1, projection=[name, ts],
infinite_source=true, output_ordering=[name@0 DESC, ts@1 DESC]
```
I think, resulting plan is correct. (e.g after projection ordering `[name
DESC, time_chunks DESC]` is still satisfied). So, we have desired behaviour but
not test coverage currently.
However, it would be nice to have unit test. Given that in case of failure,
error context is much more visible. It would really speed up debugging. With
end to end test coverage, considerable time goes to understanding the root
cause of the error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]