acking-you commented on issue #15720: URL: https://github.com/apache/datafusion/issues/15720#issuecomment-2844672108
> # Overall Implementation > Adjust `RowCursorStream` to become the owner of `Rows` with continuous reuse, requiring each partition to maintain two `Rows` instances (subsequent sorting algorithms need to maintain `prev` and `cur` states for comparison). The key challenge lies in passing a reference to `Rows` via `poll_next` to `SortPreservingMergeStream`, which is difficult to achieve under Rust's lifetime annotation constraints. > > ## Possible Implementation Approaches: > 1. **Lifetime Annotations**: Attempt to pass a reference to `Rows` using Rust's lifetime annotations (tested and found nearly infeasible). > 2. **Box Abstraction**: Wrap `Rows` in a `Box`, returning a `const*` to `Rows` during `poll_next`. This pointer must be re-encapsulated into a structure resembling `Rows`. This approach is currently under exploration. Since the `SortPreservingMergeStream` invokes `RowCursorStream::poll_next` synchronously, there are no concurrency safety concerns (yet to be tested). > > ## Optimization Opportunities: > When the `ORDER BY` clause contains only a single column, the current implementation of `SortPreservingMergeExec` still copies and generates `Rows` structures for comparison. Consider abstracting a unified structure to handle this scenario while fulfilling the requirements. After reviewing the specific code, it was found that optimization for a single column is not possible because SortPreservingMergeExec will only choose to create a [SortPreservingMergeStream](https://docs.rs/datafusion-physical-plan/47.0.0/src/datafusion_physical_plan/sorts/sort_preserving_merge.rs.html#291-337) when sorting involves more than one column. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org