yjshen commented on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1017112437
@alamb I find some of my messages for you are lost, maybe because of the network. Yes, I think the PR is ready for another round of review now. 1. for `SortKeyCursor`: `AtomicUsize` is introduced for sharing the SortKeyCursors in a BinaryHeap as well as in a Vec of cursors. I can think of deduplicating its usage and only keeping it in heap, **but let me do in a follow-up PR.** `batch_comparators: RwLock` it's required to make SortKeyCursor impl PartialOrd and Ord, since the original cmp in SortKeyCursor has the mut signature, but Ord needs it to be immutable. **Do you have any suggestions on simplifying this?** 2. for `AggregatedMetricsSet`: > I don't really understand this code -- it appears to be merging in new (zero'd) metrics to sef.all_metrics`? Why not just return self.all_metrics ? This means to aggregates all metrics during a complex operation, which is composed of multiple stages and each stage reports its statistics separately. For the sort case here: when the dataset is more significant than available memory, it will report multiple in-mem sort metrics and final merge-sort metrics from `SortPreservingMergeStream`. Therefore, We need a separation of metrics for which are final metrics (for output_rows accumulation), and which are intermediate metrics that we only account for elapsed_compute time. So I create a zero'd metrics here and gather `elapse_time` and `output_rows` separately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
