alamb commented on code in PR #3386:
URL: https://github.com/apache/arrow-datafusion/pull/3386#discussion_r966354150


##########
datafusion/core/src/physical_plan/sorts/sort_preserving_merge.rs:
##########
@@ -320,11 +317,14 @@ pub(crate) struct SortPreservingMergeStream {
     /// An id to uniquely identify the input stream batch
     next_batch_id: usize,
 
-    /// min heap for record comparison
-    min_heap: BinaryHeap<SortKeyCursor>,
+    /// Heap that yields [`SortKeyCursor`] in increasing order
+    heap: BinaryHeap<Reverse<SortKeyCursor>>,
 
     /// target batch size
     batch_size: usize,
+
+    /// row converter
+    row_converter: RowConverter,

Review Comment:
   As I mentioned in https://github.com/apache/arrow-rs/pull/2593 I worry that 
the key comparisons for dictionaries will be incorrect if we see record batches 
with different dictionaries but use the same converter



##########
datafusion/core/src/physical_plan/sorts/cursor.rs:
##########
@@ -38,54 +29,35 @@ use std::sync::Arc;
 /// a row comparator for each other cursor that it is compared to.
 pub struct SortKeyCursor {
     stream_idx: usize,
-    sort_columns: Vec<ArrayRef>,
     cur_row: usize,
     num_rows: usize,
 
     // An id uniquely identifying the record batch scanned by this cursor.
     batch_id: usize,
 
-    // A collection of comparators that compare rows in this cursor's batch to
-    // the cursors in other batches. Other batches are uniquely identified by
-    // their batch_idx.
-    batch_comparators: RwLock<HashMap<usize, Vec<DynComparator>>>,
-    sort_options: Arc<Vec<SortOptions>>,
+    rows: Rows,

Review Comment:
   that certainly looks nicer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to