alamb commented on code in PR #7364:
URL: https://github.com/apache/arrow-datafusion/pull/7364#discussion_r1303223437


##########
datafusion/core/src/physical_plan/projection.rs:
##########
@@ -64,6 +68,10 @@ pub struct ProjectionExec {
     columns_map: HashMap<Column, Vec<Column>>,
     /// Execution metrics
     metrics: ExecutionPlanMetricsSet,
+    /// Expressions' normalized orderings (as given by the output ordering API
+    /// and normalized with respect to equivalence classes of input plan). The
+    /// projected expressions are mapped by their indices to this vector.
+    orderings: Vec<Option<PhysicalSortExpr>>,

Review Comment:
   I don't fully understand how this is different than `output_ordering` (which 
can also be a Vec of PhysicalSortExpr`). Would it makes sense to always 
normalize the output ordering in terms of the input's equivalence classes?
   
   I thought the notion of multiple equivalent orderings was expressed using 
https://docs.rs/datafusion-physical-expr/28.0.0/datafusion_physical_expr/equivalence/type.OrderingEquivalentClass.html
 
   
   Or maybe the key difference is this can track the ordering of each 
expression independently?



##########
datafusion/core/src/physical_plan/projection.rs:
##########
@@ -318,6 +339,228 @@ impl ExecutionPlan for ProjectionExec {
     }
 }
 
+/// Calculates the output orderings for a set of expressions within the 
context of a given
+/// execution plan. The resulting orderings are all in the type of [`Column`], 
since these
+/// expressions become [`Column`] after the projection step. The expressions 
having an alias
+/// are renamed with those aliases in the returned [`PhysicalSortExpr`]'s. If 
an expression
+/// is found to be unordered, the corresponding entry in the output vector is 
`None`.
+///
+/// # Arguments
+///
+/// * `expr` - A slice of tuples containing expressions and their 
corresponding aliases.
+///
+/// * `input` - A reference to an execution plan that provides output ordering 
and equivalence
+/// properties.
+///
+/// # Returns
+///
+/// A `Result` containing a vector of optional [`PhysicalSortExpr`]'s. Each 
element of the
+/// vector corresponds to an expression from the input slice. If an expression 
can be ordered,
+/// the corresponding entry is `Some(PhysicalSortExpr)`. If an expression 
cannot be ordered,
+/// the entry is `None`.
+fn find_orderings_of_exprs(
+    expr: &[(Arc<dyn PhysicalExpr>, String)],
+    input: &Arc<dyn ExecutionPlan>,
+) -> Result<Vec<Option<PhysicalSortExpr>>> {
+    let mut orderings: Vec<Option<PhysicalSortExpr>> = vec![];
+    if let Some(leading_ordering) = input

Review Comment:
   I wonder if this might be more easy to find if it was with the other 
ordering code in 
https://docs.rs/datafusion/latest/datafusion/physical_expr/utils/index.html
   
   



##########
datafusion/physical-expr/src/expressions/binary.rs:
##########
@@ -694,6 +690,94 @@ impl PhysicalExpr for BinaryExpr {
         let mut s = state;
         self.hash(&mut s);
     }
+
+    /// For each operator, [`BinaryExpr`] has distinct ordering rules.
+    /// TODO: There may be rules specific to some data types (such as division 
and multiplication on unsigned integers)
+    fn get_ordering(&self, children: &[SortProperties]) -> SortProperties {
+        let (left_child, right_child) = (&children[0], &children[1]);
+        match self.op() {
+            Operator::Plus => left_child.add(right_child),
+            Operator::Minus => left_child.sub(right_child),
+            Operator::Gt | Operator::GtEq => 
left_child.gt_or_gteq(right_child),
+            Operator::Lt | Operator::LtEq => 
right_child.gt_or_gteq(left_child),
+            Operator::And => left_child.and(right_child),
+            _ => SortProperties::Unordered,
+        }
+    }
+}
+
+impl SortProperties {

Review Comment:
   it is somewhat strange to put this impl into a different module than the 
definion of `SortProperties`
   
   Maybe we could move all SortProperties related stuff to 
`datafusion/physical-expr/src/sort_properties.rs` or something



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to