alamb commented on code in PR #9813:
URL: https://github.com/apache/arrow-datafusion/pull/9813#discussion_r1559856903
##########
datafusion/physical-expr/src/equivalence/mod.rs:
##########
@@ -45,6 +46,46 @@ pub fn collapse_lex_req(input: LexRequirement) ->
LexRequirement {
output
}
+/// This function constructs a normalized [`LexRequirement`] by filtering out
entries
+/// that are ordered if the next entry is.
Review Comment:
It seems like there are no additional callsites of `collapse_lex_ordering`
-- perhaps we should simply update `collapse_lex_ordering` instead of making a
new function 🤔
https://github.com/search?q=repo%3Aapache%2Farrow-datafusion+collapse_lex_ordering&type=code
##########
datafusion/sqllogictest/test_files/filter_without_sort_exec.slt:
##########
@@ -43,19 +43,36 @@ SortPreservingMergeExec: [date@0 ASC NULLS LAST,time@2 ASC
NULLS LAST]
------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
--------StreamingTableExec: partition_sizes=1, projection=[date, ticker,
time], infinite_source=true, output_ordering=[date@0 ASC NULLS LAST, ticker@1
ASC NULLS LAST, time@2 ASC NULLS LAST]
+# query
+query TT
Review Comment:
Could you also test some similar queries that should also have their sorts
removed? LIke
```sql
SELECT * FROM data
WHERE ticker = 'A' AND CAST(time AS DATE) = date
ORDER BY "date"
```
```sql
SELECT * FROM data
WHERE ticker = 'A' AND CAST(time AS DATE) = date
ORDER BY "ticker"
```
As well as some cases where it should not:
```sql
SELECT * FROM data
WHERE ticker = 'A' AND CAST(time AS DATE) <> date
ORDER BY "timestamp"
```
##########
datafusion/physical-expr/src/equivalence/properties.rs:
##########
@@ -2212,6 +2212,97 @@ mod tests {
);
}
+ Ok(())
+ }
+ #[test]
+ fn test_eliminate_redundant_monotonic_sorts() -> Result<()> {
+ let schema = Arc::new(Schema::new(vec![
+ Field::new("a", DataType::Date32, true),
+ Field::new("b", DataType::Utf8, true),
+ Field::new("c", DataType::Timestamp(TimeUnit::Nanosecond, None),
true),
+ ]));
+ let base_properties =
EquivalenceProperties::new(schema.clone()).with_reorder(
+ ["a", "b", "c"]
+ .into_iter()
+ .map(|c| {
+ col(c, schema.as_ref()).map(|expr| PhysicalSortExpr {
+ expr,
+ options: SortOptions {
+ descending: false,
+ nulls_first: true,
+ },
+ })
+ })
+ .collect::<Result<Vec<_>>>()?,
+ );
+
+ struct TestCase {
+ name: &'static str,
+ constants: Vec<Arc<dyn PhysicalExpr>>,
+ equal_conditions: Vec<[Arc<dyn PhysicalExpr>; 2]>,
+ sort_columns: &'static [&'static str],
+ should_satisfy_ordering: bool,
+ }
+
+ let cases = vec![
Review Comment:
I would find these cases easier to read if they were written like
```rust
let col_a = col("a", schema.as_ref())?;
let col_b = col("b", schema.as_ref())?;
let cast_c = Arc::new(CastExpr::new(
col("c", schema.as_ref())?,
DataType::Date32,
None,
));
...
TestCase {
name: "(date, ticker, time) -> (time)",
// b is constant, so it should be removed from the sort order
constants: vec![col_b.clone()?],
equal_conditions: vec![[cast_c.clone(), col_a.clone()]],
sort_columns: &["c"],
should_satisfy_ordering: true,
...
```
##########
datafusion/physical-expr/src/equivalence/properties.rs:
##########
@@ -2212,6 +2212,97 @@ mod tests {
);
}
+ Ok(())
+ }
+ #[test]
+ fn test_eliminate_redundant_monotonic_sorts() -> Result<()> {
+ let schema = Arc::new(Schema::new(vec![
+ Field::new("a", DataType::Date32, true),
Review Comment:
Would it be possible to use the same names in the schema `(a, b, c)` that
are used in the test description `(date, ticker, time)`? I found it hard to
mentally map between the two while reviewing this code
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]