alamb commented on code in PR #3386:
URL: https://github.com/apache/arrow-datafusion/pull/3386#discussion_r966354150
##########
datafusion/core/src/physical_plan/sorts/sort_preserving_merge.rs:
##########
@@ -320,11 +317,14 @@ pub(crate) struct SortPreservingMergeStream {
/// An id to uniquely identify the input stream batch
next_batch_id: usize,
- /// min heap for record comparison
- min_heap: BinaryHeap<SortKeyCursor>,
+ /// Heap that yields [`SortKeyCursor`] in increasing order
+ heap: BinaryHeap<Reverse<SortKeyCursor>>,
/// target batch size
batch_size: usize,
+
+ /// row converter
+ row_converter: RowConverter,
Review Comment:
As I mentioned in https://github.com/apache/arrow-rs/pull/2593 I worry that
the key comparisons for dictionaries will be incorrect if we see record batches
with different dictionaries but use the same converter
##########
datafusion/core/src/physical_plan/sorts/cursor.rs:
##########
@@ -38,54 +29,35 @@ use std::sync::Arc;
/// a row comparator for each other cursor that it is compared to.
pub struct SortKeyCursor {
stream_idx: usize,
- sort_columns: Vec<ArrayRef>,
cur_row: usize,
num_rows: usize,
// An id uniquely identifying the record batch scanned by this cursor.
batch_id: usize,
- // A collection of comparators that compare rows in this cursor's batch to
- // the cursors in other batches. Other batches are uniquely identified by
- // their batch_idx.
- batch_comparators: RwLock<HashMap<usize, Vec<DynComparator>>>,
- sort_options: Arc<Vec<SortOptions>>,
+ rows: Rows,
Review Comment:
that certainly looks nicer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]