Lordworms commented on issue #7053:
URL: https://github.com/apache/datafusion/issues/7053#issuecomment-2586009190
Current design is
1. substitute `SendableRecordBatchStream` between `SortPreservingMergeExec`
and `SortExec` to `RowOrColumnStream`
```Rust
pub enum RowOrColumn {
Row(Rows),
Column(RecordBatch),
}
/// Contains a Rows or a Recordbatch
pub type RowOrColumnStream = Pin<Box<dyn Stream<Item = Result<RowOrColumn>>
+ Send>>;
```
2. In the begining of SortExec, we build the RowConverter, (if it is a
single column sort, we don't build this and send a Recordbatch)
3. for every recordbatch `SortExec` recieved, we convert it into Rows and
do spill logic using Rows format(I implemented a rudimentary reader and writer
for Rows)
4. in `SortPreservingMergeExec` we convert the rows to [[ArrayRef]] (We have
to do this since I didn't find any arrow methods to directly build Recordbatch
from Rows) and didn't break any loser_tree logics.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]