pitrou opened a new pull request, #45217:
URL: https://github.com/apache/arrow/pull/45217

   ### Rationale for this change
   
   The Rank implementation currently mixes ties/duplicates detection and rank 
computation in a single function `CreateRankings`. This makes it poorly 
reusable for other Rank-like functions such as the Percentile Rank function 
proposed in GH-45190.
   
   ### What changes are included in this PR?
   
   Split duplicates detection into a dedicated function that sets a marker bit 
in the sort-indices array (it is private to the Rank implementation, so it is 
safe to mutate it).
   
   The rank computation itself (`CreateRankings`) becomes simpler and, 
moreover, it does not need to read the input values: it becomes therefore 
type-agnostic.
   
   This yields a code size reduction (around 45kB saved on the author's 
machine):
   * before:
   ```console
   $ size /build/build-release/relwithdebinfo/libarrow.so
      text         data     bss     dec     hex filename
   26072218      353832 2567985 28994035        1ba69f3 
/build/build-release/relwithdebinfo/libarrow.so
   ```
   * after:
   ```console
   $ size /build/build-release/relwithdebinfo/libarrow.so
      text         data     bss     dec     hex filename
   26028198      353832 2567985 28950015        1b9bdff 
/build/build-release/relwithdebinfo/libarrow.so
   ```
   
   Rank benchmark results are mostly neutral, though there are slight 
improvements on some benchmarks, and slight regressions especially on all-nulls 
input.
   
   ### Are these changes tested?
   
   Yes, by existing tests.
   
   ### Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to