yjshen commented on issue #10073:
URL: https://github.com/apache/datafusion/issues/10073#issuecomment-2308985824

   Another point of code worth noticing is inside the current `sort_batch` 
implementation:
   
https://github.com/apache/datafusion/blob/79fa6f9098be9a6e5b269cd3642694765b230ff1/datafusion/physical-plan/src/sorts/sort.rs#L601-L607
   
   Performance-wise, I think it's beneficial to apply the row format comparison 
to all multi-column cases, however, while considering `sort_batch` is used in 
multiple places where `spill` is being called; creating rows before comparing 
would introduce more memory pressure. 
   
   BTW, I think we should report memory usage inside 
`lexsort_to_indices_multi_columns`:
   
https://github.com/apache/datafusion/blob/79fa6f9098be9a6e5b269cd3642694765b230ff1/datafusion/physical-plan/src/sorts/sort.rs#L650-L652
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to