[GitHub] [arrow-datafusion] ozankabak commented on issue #5230: Use Arrow Row Format in SortExec

via GitHub Fri, 03 Mar 2023 06:31:31 -0800


ozankabak commented on issue #5230:
URL: 
https://github.com/apache/arrow-datafusion/issues/5230#issuecomment-1453617881


   > How common are such batches in practice? I guess I'm wondering if the 
added complexity is justified for what is effectively a degenerate case that 
will cause issues far beyond just for sort?
   
   Can't speak for the usages at large, but I've personally had multiple use 
cases before in my data pipelines at various jobs. At Synnada, we use this 
parameter to trade-off throughout vs. latency; in some cases one is more 
important than the other depending on volumes etc. For this use case, this 
check adds no new complexity, so we are all good in that regard.
   
   > The main reason I ask is DynComparator, which underpins non-single-column 
lexsort, has known issues w.r.t sorting nulls, and I had hoped to eventually 
deprecate and remove it - https://github.com/apache/arrow-rs/issues/2687
   
   Good to know. I will think about this and discuss with my team, this will on 
our radar for future work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] ozankabak commented on issue #5230: Use Arrow Row Format in SortExec

Reply via email to