wiedld opened a new pull request, #7842:
URL: https://github.com/apache/arrow-datafusion/pull/7842

   Part of #7181 
   
   ## Rationale for this change
   
   Moving around a bit of code, before the build out of the stream I/O for the 
merge node.
   Goal is to have a separation of concerns:
   * merge node logic is the loser tree (`SortPreservingMergeStream`)
   * sort order builder will handle the creation of sort orders (and in the 
future, any batch slicing & offset changes)
   
   ## What changes are included in this PR?
   
   * rename `BatchBuilder` => `SortOrderBuilder`. 
   *  A new submodule `sorts/batch`.
       * Handles anything specific to a record batch.
       * Currently includes the `BatchCursor`, which will (in next PR) contain 
the unique BatchId and be yielded per merge node.
       * In next PR, will include the `BatchTracker` which collects the record 
batches and assigns a unique BatchId, such that the cascading streams only pass 
around cursors.
   * Metrics:
       * poll metrics only collected around operator poll (the cascade tree 
root).
       * compute metric still collected in the loser tree.
   * Move cursor into SortOrderBuilder. 
       * In the future, yielding of sort orders will include cursor slicing.
       * The merge node should not need to care about cursor slice. Therefore 
the cursor is within the sort order builder.
   
   ## Are these changes tested?
   
   Passing sort tests.
   Let me know if any additional tests should be added.
   
   ## Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to