alamb commented on code in PR #7379:
URL: https://github.com/apache/arrow-datafusion/pull/7379#discussion_r1325930283
##########
datafusion/core/src/physical_plan/sorts/builder.rs:
##########
@@ -15,136 +15,230 @@
// specific language governing permissions and limitations
// under the License.
-use arrow::compute::interleave;
-use arrow::datatypes::SchemaRef;
-use arrow::record_batch::RecordBatch;
use datafusion_common::Result;
-use datafusion_execution::memory_pool::MemoryReservation;
+use std::collections::VecDeque;
+use std::mem::take;
-#[derive(Debug, Copy, Clone, Default)]
-struct BatchCursor {
- /// The index into BatchBuilder::batches
- batch_idx: usize,
+use super::cursor::Cursor;
+use super::stream::{BatchId, BatchOffset};
+
+pub type SortOrder = (BatchId, usize, BatchOffset); // batch_id, row_idx
(without offset)
Review Comment:
what is the last element? the offset within the batch? How is that different
than row index? Or perhaps the middle type is the number of rows?
##########
datafusion/core/src/physical_plan/sorts/cascade.rs:
##########
@@ -0,0 +1,247 @@
+use crate::physical_plan::metrics::BaselineMetrics;
+use crate::physical_plan::sorts::builder::SortOrder;
+use crate::physical_plan::sorts::cursor::Cursor;
+use crate::physical_plan::sorts::merge::SortPreservingMergeStream;
+use crate::physical_plan::sorts::stream::{
+ BatchCursorStream, BatchId, BatchTrackingStream, MergeStream,
OffsetCursorStream,
+ YieldedCursorStream,
+};
+use crate::physical_plan::stream::ReceiverStream;
+use crate::physical_plan::RecordBatchStream;
+
+use arrow::compute::interleave;
+use arrow::datatypes::SchemaRef;
+use arrow::record_batch::RecordBatch;
+use datafusion_common::Result;
+use datafusion_execution::memory_pool::MemoryReservation;
+use futures::{Stream, StreamExt};
+use std::collections::{HashMap, VecDeque};
+use std::pin::Pin;
+use std::sync::Arc;
+use std::task::{Context, Poll};
+
+pub(crate) struct SortPreservingCascadeStream<C: Cursor> {
Review Comment:
Perhaps we can copy the ascii art diagram (or make something similar) from
https://github.com/apache/arrow-datafusion/issues/7181#issue-1833141868 ? It
might help others get oriented
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]