alamb commented on PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2802910163

   > Thanks for sharing the results @zhuqi-lucas this is really interesting!
   > 
   > I think it mainly shows that we probably should try and use more efficient 
in memory sorting (e.g. an arrow kernel that sorts multiple batches) here 
rather than use `SortPreservingMergeStream` which is intended to be used on 
data streams. The arrow kernel would avoid the regressions of `concat`.
   
   I think the SortPreservingMergeStream is about as efficient as we know how 
to make it
   
   Maybe we can look into what overhead makes concat'ing better 🤔  Any 
per-stream overhead we can improve in SortPreservingMergeStream would likely 
flow directly to any query that does sorts


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to