yjshen commented on issue #10073: URL: https://github.com/apache/datafusion/issues/10073#issuecomment-2308985824
Another point of code worth noticing is inside the current `sort_batch` implementation: https://github.com/apache/datafusion/blob/79fa6f9098be9a6e5b269cd3642694765b230ff1/datafusion/physical-plan/src/sorts/sort.rs#L601-L607 Performance-wise, I think it's beneficial to apply the row format comparison to all multi-column cases, however, while considering `sort_batch` is used in multiple places where `spill` is being called; creating rows before comparing would introduce more memory pressure. BTW, I think we should report memory usage inside `lexsort_to_indices_multi_columns`: https://github.com/apache/datafusion/blob/79fa6f9098be9a6e5b269cd3642694765b230ff1/datafusion/physical-plan/src/sorts/sort.rs#L650-L652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
