taepper opened a new pull request, #50248: URL: https://github.com/apache/arrow/pull/50248
Thanks for opening a pull request! ### Rationale for this change @pitrou mentioned this as a follow-up in #46926 ### What changes are included in this PR? Refactoring sorting methods to reuse the helper methods avoid maintaining two abstractions for null partitions. The new abstraction was very seamless to implement in most cases, but a few spots required some care In particular, these functions were severly simlpified by the new abstraction: - `MarkDuplicates`: duplicate nulls and nans were detected by checking every single row for `Null` one additional time, after we already had (and discarded) the nullness information - `GenericMergeImpl`: merging of `null`-ranges involved repartitioning `null` and `nan` values in every merge invocation. Now, we track this distinction and do not need any merge function for `null` and `nan` blocks ### Are these changes tested? Yes, the compute test suite passes as before ### Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
