yjshen commented on a change in pull request #1596:
URL: https://github.com/apache/arrow-datafusion/pull/1596#discussion_r785935850



##########
File path: datafusion/src/physical_plan/sorts/external_sort.rs
##########
@@ -353,12 +386,79 @@ pub struct ExternalSortExec {
     input: Arc<dyn ExecutionPlan>,
     /// Sort expressions
     expr: Vec<PhysicalSortExpr>,
-    /// Execution metrics
-    metrics: ExecutionPlanMetricsSet,
+    /// Containing all metrics set created for sort, such as all sets for 
`sort_merge_join`s
+    all_metrics: AggregatedMetricsSet,
     /// Preserve partitions of input plan
     preserve_partitioning: bool,
 }
 
+#[derive(Debug, Clone)]
+struct AggregatedMetricsSet {
+    intermediate: Arc<std::sync::Mutex<Vec<ExecutionPlanMetricsSet>>>,
+    final_: Arc<std::sync::Mutex<Vec<ExecutionPlanMetricsSet>>>,
+}

Review comment:
       Introduce `AggregatedMetricsSet` for `ExternalSortExec` since it may 
include multi partial sort. Each partial sort itself is 
`SortPreservingMergeStream` and has its own metrics set.

##########
File path: datafusion/src/physical_plan/common.rs
##########
@@ -61,12 +68,13 @@ impl Stream for SizedRecordBatchStream {
         mut self: std::pin::Pin<&mut Self>,
         _: &mut Context<'_>,
     ) -> Poll<Option<Self::Item>> {
-        Poll::Ready(if self.index < self.batches.len() {
+        let poll = Poll::Ready(if self.index < self.batches.len() {
             self.index += 1;
             Some(Ok(self.batches[self.index - 1].as_ref().clone()))
         } else {
             None
-        })
+        });
+        self.baseline_metrics.record_poll(poll)

Review comment:
       Make metrics right when there is only one record batch.

##########
File path: datafusion/src/physical_plan/sorts/external_sort.rs
##########
@@ -656,4 +774,187 @@ mod tests {
 
         Ok(())
     }
+

Review comment:
       Copy all tests from `SortExec` here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to