taepper opened a new pull request, #50248:
URL: https://github.com/apache/arrow/pull/50248

   Thanks for opening a pull request!
   
   ### Rationale for this change
   
   @pitrou mentioned this as a follow-up in #46926 
   
   ### What changes are included in this PR?
   
   Refactoring sorting methods to reuse the helper methods avoid maintaining 
two abstractions for null partitions. The new abstraction was very seamless to 
implement in most cases, but a few spots required some care
   
   In particular, these functions were severly simlpified by the new 
abstraction:
   - `MarkDuplicates`: duplicate nulls and nans were detected by checking every 
single row for `Null` one additional time, after we already had (and discarded) 
the nullness information
   - `GenericMergeImpl`: merging of `null`-ranges involved repartitioning 
`null` and `nan` values in every merge invocation. Now, we track this 
distinction and do not need any merge function for `null` and `nan` blocks
   
   ### Are these changes tested?
   
   Yes, the compute test suite passes as before
   
   ### Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to