mustafasrepo commented on code in PR #9435:
URL: https://github.com/apache/arrow-datafusion/pull/9435#discussion_r1513955386


##########
datafusion/physical-expr/src/equivalence/projection.rs:
##########
@@ -761,10 +735,7 @@ mod tests {
                 // expected
                 vec![
                     // [a_new ASC, date_bin_res ASC]
-                    // Please note that result is not [a_new ASC, date_bin_res 
ASC, b_new ASC]

Review Comment:
   I double checked whether this functionality is covered by the `.slt` tests. 
It seems that, `.slt` tests covers cases in the form where `date_bin` argument 
(such as `ts`) is leading ordering. Then after `date_bin` is applied, 
`date_bin` result itself is ordered. In this sense, `monotonicity` behaviour is 
covered. 
   
   However, this unit test, tests when table is ordered as `[a ASC, ts 
ASC]`,after projection `a, date_bin(ts)` result should have ordering. `[a, 
date_bin(ts)]`. Meaning, date_bin argument is not leading ordering. However, 
lexicographical ordering is still preserved. As long as previous expressions 
are present in the projection (in the example `a`). 
   
   I checked whether this behaviour is preserved in this PR. By running 
following test
   ```
   # create an unbounded table that contains name, timestamp.
   # where table is ordered by name DESC, ts DESC
   statement ok
   CREATE UNBOUNDED EXTERNAL TABLE unbounded_csv_with_timestamps2 (
     name VARCHAR,
     ts TIMESTAMP
   )
   STORED AS CSV
   WITH ORDER (name DESC, ts DESC)
   LOCATION '../core/tests/data/timestamps.csv'
   
   query TT
   EXPLAIN SELECT name, date_bin('15 minutes', ts) as time_chunks
     FROM unbounded_csv_with_timestamps2
     ORDER BY name DESC, time_chunks DESC
     LIMIT 5;
   ----
   logical_plan
   Limit: skip=0, fetch=5
   --Sort: unbounded_csv_with_timestamps2.name DESC NULLS FIRST, time_chunks 
DESC NULLS FIRST, fetch=5
   ----Projection: unbounded_csv_with_timestamps2.name, 
date_bin(IntervalMonthDayNano("900000000000"), 
unbounded_csv_with_timestamps2.ts) AS time_chunks
   ------TableScan: unbounded_csv_with_timestamps2 projection=[name, ts]
   physical_plan
   GlobalLimitExec: skip=0, fetch=5
   --SortPreservingMergeExec: [name@0 DESC,time_chunks@1 DESC], fetch=5
   ----ProjectionExec: expr=[name@0 as name, date_bin(900000000000, ts@1) as 
time_chunks]
   ------RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1
   --------StreamingTableExec: partition_sizes=1, projection=[name, ts], 
infinite_source=true, output_ordering=[name@0 DESC, ts@1 DESC]
   ```
   I think, resulting plan is correct. (e.g after projection ordering `[name 
DESC, time_chunks DESC]` is still satisfied). So, we have desired behaviour but 
not test coverage currently.
   
   However, it would be nice to have unit test. Given that in case of failure, 
error context is much more visible. It would really speed up debugging. With 
end to end test coverage, considerable time goes to understanding the root 
cause of the error.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to