alamb commented on code in PR #7364:
URL: https://github.com/apache/arrow-datafusion/pull/7364#discussion_r1303223437
##########
datafusion/core/src/physical_plan/projection.rs:
##########
@@ -64,6 +68,10 @@ pub struct ProjectionExec {
columns_map: HashMap<Column, Vec<Column>>,
/// Execution metrics
metrics: ExecutionPlanMetricsSet,
+ /// Expressions' normalized orderings (as given by the output ordering API
+ /// and normalized with respect to equivalence classes of input plan). The
+ /// projected expressions are mapped by their indices to this vector.
+ orderings: Vec<Option<PhysicalSortExpr>>,
Review Comment:
I don't fully understand how this is different than `output_ordering` (which
can also be a Vec of PhysicalSortExpr`). Would it makes sense to always
normalize the output ordering in terms of the input's equivalence classes?
I thought the notion of multiple equivalent orderings was expressed using
https://docs.rs/datafusion-physical-expr/28.0.0/datafusion_physical_expr/equivalence/type.OrderingEquivalentClass.html
Or maybe the key difference is this can track the ordering of each
expression independently?
##########
datafusion/core/src/physical_plan/projection.rs:
##########
@@ -318,6 +339,228 @@ impl ExecutionPlan for ProjectionExec {
}
}
+/// Calculates the output orderings for a set of expressions within the
context of a given
+/// execution plan. The resulting orderings are all in the type of [`Column`],
since these
+/// expressions become [`Column`] after the projection step. The expressions
having an alias
+/// are renamed with those aliases in the returned [`PhysicalSortExpr`]'s. If
an expression
+/// is found to be unordered, the corresponding entry in the output vector is
`None`.
+///
+/// # Arguments
+///
+/// * `expr` - A slice of tuples containing expressions and their
corresponding aliases.
+///
+/// * `input` - A reference to an execution plan that provides output ordering
and equivalence
+/// properties.
+///
+/// # Returns
+///
+/// A `Result` containing a vector of optional [`PhysicalSortExpr`]'s. Each
element of the
+/// vector corresponds to an expression from the input slice. If an expression
can be ordered,
+/// the corresponding entry is `Some(PhysicalSortExpr)`. If an expression
cannot be ordered,
+/// the entry is `None`.
+fn find_orderings_of_exprs(
+ expr: &[(Arc<dyn PhysicalExpr>, String)],
+ input: &Arc<dyn ExecutionPlan>,
+) -> Result<Vec<Option<PhysicalSortExpr>>> {
+ let mut orderings: Vec<Option<PhysicalSortExpr>> = vec![];
+ if let Some(leading_ordering) = input
Review Comment:
I wonder if this might be more easy to find if it was with the other
ordering code in
https://docs.rs/datafusion/latest/datafusion/physical_expr/utils/index.html
##########
datafusion/physical-expr/src/expressions/binary.rs:
##########
@@ -694,6 +690,94 @@ impl PhysicalExpr for BinaryExpr {
let mut s = state;
self.hash(&mut s);
}
+
+ /// For each operator, [`BinaryExpr`] has distinct ordering rules.
+ /// TODO: There may be rules specific to some data types (such as division
and multiplication on unsigned integers)
+ fn get_ordering(&self, children: &[SortProperties]) -> SortProperties {
+ let (left_child, right_child) = (&children[0], &children[1]);
+ match self.op() {
+ Operator::Plus => left_child.add(right_child),
+ Operator::Minus => left_child.sub(right_child),
+ Operator::Gt | Operator::GtEq =>
left_child.gt_or_gteq(right_child),
+ Operator::Lt | Operator::LtEq =>
right_child.gt_or_gteq(left_child),
+ Operator::And => left_child.and(right_child),
+ _ => SortProperties::Unordered,
+ }
+ }
+}
+
+impl SortProperties {
Review Comment:
it is somewhat strange to put this impl into a different module than the
definion of `SortProperties`
Maybe we could move all SortProperties related stuff to
`datafusion/physical-expr/src/sort_properties.rs` or something
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]