This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 80f70a5dcf87 [SPARK-51001][SQL] Refine arrayEquals
80f70a5dcf87 is described below

commit 80f70a5dcf87939110fd46c5d9bca0a9531e408c
Author: Yibo Cai <[email protected]>
AuthorDate: Tue Jan 28 08:30:32 2025 -0600

    [SPARK-51001][SQL] Refine arrayEquals
    
    This is a trivial change to replace the loop index from `int` to `long`. 
Surprisingly, microbenchmark shows more than double performance uplift.
    
    Analysis
    --------
    The hot loop of `arrayEquals` method is simplifed as below. Loop index `i` 
is defined as `int`, it's compared with `length`, which is a `long`, to 
determine if the loop should end.
    ```
    public static boolean arrayEquals(
        Object leftBase, long leftOffset, Object rightBase, long rightOffset, 
final long length) {
      ......
      int i = 0;
      while (i <= length - 8) {
        if (Platform.getLong(leftBase, leftOffset + i) !=
            Platform.getLong(rightBase, rightOffset + i)) {
              return false;
        }
        i += 8;
      }
      ......
    }
    ```
    
    Strictly speaking, there's a code bug here. If `length` is greater than 
2^31 + 8, this loop will never end because `i` as a 32 bit integer is at most 
2^31 - 1. But compiler must consider this behaviour as intentional and generate 
code strictly match the logic. It prevents compiler from generating optimal 
code.
    
    Defining loop index `i` as `long` corrects this issue. Besides more 
accurate code logic, JIT is able to optimize this code much more aggressively. 
From microbenchmark, this trivial change improves performance significantly on 
both Arm and x86 platforms.
    
    Benchmark
    ---------
    Source code:
    https://gist.github.com/cyb70289/258e261f388e22f47e4d961431786d1a
    
    Result on Arm Neoverse N2:
    ```
    Benchmark                             Mode  Cnt    Score   Error  Units
    ArrayEqualsBenchmark.arrayEqualsInt   avgt   10  674.313 ± 0.213  ns/op
    ArrayEqualsBenchmark.arrayEqualsLong  avgt   10  313.563 ± 2.338  ns/op
    ```
    
    Result on Intel Cascake Lake:
    ```
    Benchmark                             Mode  Cnt     Score   Error  Units
    ArrayEqualsBenchmark.arrayEqualsInt   avgt   10  1130.695 ± 0.168  ns/op
    ArrayEqualsBenchmark.arrayEqualsLong  avgt   10   461.979 ± 0.097  ns/op
    ```
    
    Deep dive
    ---------
    Dive deep to the machine code level, we can see why the big gap. Listed 
below are arm64 assembly generated by Openjdk-17 C2 compiler.
    
    For `int i`, the machine code is similar to source code, no deep 
optimization. Safepoint polling is expensive in this short loop.
    ```
    // jit c2 machine code snippet
      0x0000ffff81ba8904:   mov        w15, wzr              // int i = 0
      0x0000ffff81ba8908:   nop
      0x0000ffff81ba890c:   nop
    loop:
      0x0000ffff81ba8910:   ldr        x10, [x13, w15, sxtw] // 
Platform.getLong(leftBase, leftOffset + i)
      0x0000ffff81ba8914:   ldr        x14, [x12, w15, sxtw] // 
Platform.getLong(rightBase, rightOffset + i)
      0x0000ffff81ba8918:   cmp        x10, x14
      0x0000ffff81ba891c:   b.ne       0x0000ffff81ba899c    // return false if 
not equal
      0x0000ffff81ba8920:   ldr        x14, [x28, #848]      // x14 -> safepoint
      0x0000ffff81ba8924:   add        w15, w15, #0x8        // i += 8
      0x0000ffff81ba8928:   ldr        wzr, [x14]            // safepoint 
polling
      0x0000ffff81ba892c:   sxtw       x10, w15              // extend i to long
      0x0000ffff81ba8930:   cmp        x10, x11
      0x0000ffff81ba8934:   b.le       0x0000ffff81ba8910    // if (i <= length 
- 8) goto loop
    ```
    
    For `long i`, JIT is able to do much more aggressive optimization. E.g, 
below code snippet unrolls the loop by four.
    ```
    // jit c2 machine code snippet
    unrolled_loop:
      0x0000ffff91de6fe0:   sxtw       x10, w7
      0x0000ffff91de6fe4:   add        x23, x22, x10
      0x0000ffff91de6fe8:   add        x24, x21, x10
      0x0000ffff91de6fec:   ldr        x13, [x23]          // unroll-1
      0x0000ffff91de6ff0:   ldr        x14, [x24]
      0x0000ffff91de6ff4:   cmp        x13, x14
      0x0000ffff91de6ff8:   b.ne       0x0000ffff91de70a8
      0x0000ffff91de6ffc:   ldr        x13, [x23, #8]      // unroll-2
      0x0000ffff91de7000:   ldr        x14, [x24, #8]
      0x0000ffff91de7004:   cmp        x13, x14
      0x0000ffff91de7008:   b.ne       0x0000ffff91de70b4
      0x0000ffff91de700c:   ldr        x13, [x23, #16]     // unroll-3
      0x0000ffff91de7010:   ldr        x14, [x24, #16]
      0x0000ffff91de7014:   cmp        x13, x14
      0x0000ffff91de7018:   b.ne       0x0000ffff91de70a4
      0x0000ffff91de701c:   ldr        x13, [x23, #24]     // unroll-4
      0x0000ffff91de7020:   ldr        x14, [x24, #24]
      0x0000ffff91de7024:   cmp        x13, x14
      0x0000ffff91de7028:   b.ne       0x0000ffff91de70b0
      0x0000ffff91de702c:   add        w7, w7, #0x20
      0x0000ffff91de7030:   cmp        w7, w11
      0x0000ffff91de7034:   b.lt       0x0000ffff91de6fe0
    ```
    
    ### What changes were proposed in this pull request?
    
    A trivial change to replace loop index `i` of method `arrayEquals` from 
`int` to `long`.
    
    ### Why are the changes needed?
    
    To improve performance and fix a possible bug.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Existing unit tests.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #49568 from cyb70289/arrayEquals.
    
    Authored-by: Yibo Cai <[email protected]>
    Signed-off-by: Sean Owen <[email protected]>
---
 .../src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java
 
b/common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java
index f81c06092765..8c303928f557 100644
--- 
a/common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java
+++ 
b/common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java
@@ -51,7 +51,7 @@ public class ByteArrayMethods {
    */
   public static boolean arrayEquals(
       Object leftBase, long leftOffset, Object rightBase, long rightOffset, 
final long length) {
-    int i = 0;
+    long i = 0;
 
     // check if stars align and we can get both offsets to be aligned
     if (!unaligned && ((leftOffset % 8) == (rightOffset % 8))) {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to