acking-you commented on issue #15720:
URL: https://github.com/apache/datafusion/issues/15720#issuecomment-2844672108

   > # Overall Implementation
   > Adjust `RowCursorStream` to become the owner of `Rows` with continuous 
reuse, requiring each partition to maintain two `Rows` instances (subsequent 
sorting algorithms need to maintain `prev` and `cur` states for comparison). 
The key challenge lies in passing a reference to `Rows` via `poll_next` to 
`SortPreservingMergeStream`, which is difficult to achieve under Rust's 
lifetime annotation constraints.
   > 
   > ## Possible Implementation Approaches:
   > 1. ​**Lifetime Annotations**: Attempt to pass a reference to `Rows` using 
Rust's lifetime annotations (tested and found nearly infeasible).
   > 2. ​**Box Abstraction**: Wrap `Rows` in a `Box`, returning a `const*` to 
`Rows` during `poll_next`. This pointer must be re-encapsulated into a 
structure resembling `Rows`. This approach is currently under exploration. 
Since the `SortPreservingMergeStream` invokes `RowCursorStream::poll_next` 
synchronously, there are no concurrency safety concerns (yet to be tested).
   > 
   > ## Optimization Opportunities:
   > When the `ORDER BY` clause contains only a single column, the current 
implementation of `SortPreservingMergeExec` still copies and generates `Rows` 
structures for comparison. Consider abstracting a unified structure to handle 
this scenario while fulfilling the requirements.
   
   After reviewing the specific code, it was found that optimization for a 
single column is not possible because SortPreservingMergeExec will only choose 
to create a 
[SortPreservingMergeStream](https://docs.rs/datafusion-physical-plan/47.0.0/src/datafusion_physical_plan/sorts/sort_preserving_merge.rs.html#291-337)
 when sorting involves more than one column.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to