ozankabak commented on issue #5230: URL: https://github.com/apache/arrow-datafusion/issues/5230#issuecomment-1453617881
> How common are such batches in practice? I guess I'm wondering if the added complexity is justified for what is effectively a degenerate case that will cause issues far beyond just for sort? Can't speak for the usages at large, but I've personally had multiple use cases before in my data pipelines at various jobs. At Synnada, we use this parameter to trade-off throughout vs. latency; in some cases one is more important than the other depending on volumes etc. For this use case, this check adds no new complexity, so we are all good in that regard. > The main reason I ask is DynComparator, which underpins non-single-column lexsort, has known issues w.r.t sorting nulls, and I had hoped to eventually deprecate and remove it - https://github.com/apache/arrow-rs/issues/2687 Good to know. I will think about this and discuss with my team, this will on our radar for future work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
