alamb commented on code in PR #13233:
URL: https://github.com/apache/datafusion/pull/13233#discussion_r1826957395
##########
datafusion/physical-expr-common/src/sort_expr.rs:
##########
@@ -352,20 +346,26 @@ pub struct LexOrdering {
pub inner: Vec<PhysicalSortExpr>,
}
+impl AsRef<LexOrdering> for LexOrdering {
+ fn as_ref(&self) -> &LexOrdering {
+ self
+ }
+}
+
impl LexOrdering {
// Creates a new [`LexOrdering`] from a vector
pub fn new(inner: Vec<PhysicalSortExpr>) -> Self {
Self { inner }
}
- pub fn as_ref(&self) -> LexOrderingRef {
- &self.inner
- }
-
pub fn capacity(&self) -> usize {
self.inner.capacity()
}
+ pub fn from_ref(lex_ordering_ref: &LexOrdering) -> Self {
Review Comment:
I think this method could be removed as it is the same as `clone()`
##########
datafusion/physical-expr-common/src/sort_expr.rs:
##########
@@ -545,6 +596,16 @@ impl IntoIterator for LexRequirement {
}
}
+impl<'a> IntoIterator for &'a LexOrdering {
Review Comment:
👍
##########
datafusion/core/src/physical_optimizer/enforce_sorting.rs:
##########
@@ -392,7 +392,11 @@ fn analyze_immediate_sort_removal(
let sort_input = sort_exec.input();
// If this sort is unnecessary, we should remove it:
if sort_input.equivalence_properties().ordering_satisfy(
- sort_exec.properties().output_ordering().unwrap_or_default(),
+ &sort_exec
+ .properties()
+ .output_ordering()
+ .cloned()
+ .unwrap_or_default(),
Review Comment:
I think if we had `LexOrdering::empty()` as explained below, we could avoid
this clone like this
```rust
&sort_exec
.properties()
.output_ordering()
.unwrap_or(LexOrdering::empty()),
```
There are similar things below too
##########
datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:
##########
@@ -1553,7 +1552,7 @@ mod tests {
Arc::new(BuiltInWindowExpr::new(
last_value_func,
&[],
- LexOrderingRef::default(),
+ LexOrdering::default().as_ref(),
Review Comment:
I don't think the `as_ref()` is needed here (it just makes another copy)
##########
benchmarks/src/sort.rs:
##########
@@ -70,79 +70,88 @@ impl RunOpt {
let sort_cases = vec![
(
"sort utf8",
- vec![PhysicalSortExpr {
- expr: col("request_method", &schema)?,
- options: Default::default(),
- }],
+ LexOrdering {
Review Comment:
Maybe this could be `LexOrdering::new`
##########
datafusion/physical-expr-common/src/sort_expr.rs:
##########
@@ -409,6 +405,36 @@ impl LexOrdering {
pub fn truncate(&mut self, len: usize) {
self.inner.truncate(len)
}
+
+ /// Merge the contents of `other` into `self`, removing duplicates.
+ pub fn merge(mut self, other: LexOrdering) -> Self {
+ self.inner = self.inner.into_iter().chain(other).unique().collect();
+ self
+ }
+
+ /// Converts a `LexRequirement` into a `LexOrdering`.
Review Comment:
I also included this change in
https://github.com/apache/datafusion/pull/13222 (mostly as a note that this
will be a logical conflict)
##########
datafusion/physical-plan/src/sorts/streaming_merge.rs:
##########
@@ -51,18 +52,34 @@ macro_rules! merge_helper {
}};
}
-#[derive(Default)]
pub struct StreamingMergeBuilder<'a> {
streams: Vec<SendableRecordBatchStream>,
schema: Option<SchemaRef>,
- expressions: LexOrderingRef<'a>,
+ expressions: &'a LexOrdering,
metrics: Option<BaselineMetrics>,
batch_size: Option<usize>,
fetch: Option<usize>,
reservation: Option<MemoryReservation>,
enable_round_robin_tie_breaker: bool,
}
+static EMPTY_ORDER: OnceLock<LexOrdering> = OnceLock::new();
+
+impl<'a> Default for StreamingMergeBuilder<'a> {
+ fn default() -> Self {
+ Self {
+ streams: vec![],
+ schema: None,
+ expressions: EMPTY_ORDER.get_or_init(LexOrdering::default),
Review Comment:
👍
What do you think about moving this `EMPTY_ORDER` to `LexOrdering`?
Something like
```rust
impl LexOrdering {
...
/// Return an empty LexOrdering (no expressions)
pub empty() -> &'static LexOrdering {
EMPTY_ORDER.get_or_init(LexOrdering::default),
}
}
```
That would avoid needing to copy LexOrderings quite as much (I'll comment
inline
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]