[GitHub] [arrow-datafusion] ozankabak commented on issue #5230: Use Arrow Row Format in SortExec

via GitHub Thu, 02 Mar 2023 14:27:40 -0800


ozankabak commented on issue #5230:
URL: 
https://github.com/apache/arrow-datafusion/issues/5230#issuecomment-1452643344


   @jaylmiller, we recently ran into something similar to your observation. We 
are improving `PARTITION BY` clauses in window calculations to avoid 
pipeline-breaking sorts for non-sorted data (by using hashing instead), and we 
utilized row converter to see if/how much it helps.
   
   In test cases with a single partition, it definitely helps. In test cases 
where we have multiple partitions, batch sizes get smaller (since there is no 
automatic batch coalescing) and it results in a slowdown. This is in agreement 
with your theory, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] ozankabak commented on issue #5230: Use Arrow Row Format in SortExec

Reply via email to