andishgar commented on issue #48123:
URL: https://github.com/apache/arrow/issues/48123#issuecomment-3561813079

   @pitrou I conducted further research, and here are the results:
   1-The implementation has a bug around numbers that are powers of 2. Here is 
the code that proves it.
   ```c++
   TEST(ULP, AroundPower2) {
     double a_raw = 4;
     double a = std::nextafter(a_raw, -1 * 
std::numeric_limits<double>::infinity());
     double b = std::nextafter(a_raw, std::numeric_limits<double>::infinity());
     double c = std::nextafter(a, std::numeric_limits<double>::infinity());
      c = std::nextafter(c, std::numeric_limits<double>::infinity());
     ASSERT_EQ(b,c);
     ASSERT_FALSE(WithinUlp(a, b, 2));
   }
   ```
   
   2- I created the benchmark to mimic the behavior of float comparisons on the 
array, and here are the results.
   ```
   My CPU is
   
   Architecture:                x86_64
     CPU op-mode(s):            32-bit, 64-bit
     Address sizes:             39 bits physical, 48 bits virtual
     Byte Order:                Little Endian
   CPU(s):                      12
     On-line CPU(s) list:       0-11
   Vendor ID:                   GenuineIntel
     Model name:                11th Gen Intel(R) Core(TM) i5-11400H @ 2.70GHz
       CPU family:              6
       Model:                   141
       Thread(s) per core:      2
       Core(s) per socket:      6
       Socket(s):               1
       Stepping:                1
       CPU(s) scaling MHz:      45%
       CPU max MHz:             4500.0000
       CPU min MHz:             800.0000
   ```
   
   the result in clang 20.1.2 is
   
   ```
   2025-11-21T09:56:15+03:30
   Running ../my_version/Benchmark_ULP
   Run on (12 X 2315.73 MHz CPU s)
   CPU Caches:
     L1 Data 48 KiB (x6)
     L1 Instruction 32 KiB (x6)
     L2 Unified 1280 KiB (x6)
     L3 Unified 12288 KiB (x1)
   Load Average: 0.88, 0.98, 0.98
   
------------------------------------------------------------------------------------
   Benchmark                          Time             CPU   Iterations 
UserCounters...
   
------------------------------------------------------------------------------------
   my_float_version/1000           1996 ns         1996 ns       350625 
bytes_per_second=3.7319Gi/s items_per_second=500.887M/s
   my_float_version/10000         24942 ns        24941 ns        27731 
bytes_per_second=2.98727Gi/s items_per_second=400.944M/s
   my_float_version/100000       411094 ns       411095 ns         1719 
bytes_per_second=1.81237Gi/s items_per_second=243.253M/s
   my_float_version/1000000     4284970 ns      4284597 ns          163 
bytes_per_second=1.73892Gi/s items_per_second=233.394M/s
   my_float_version/10000000   44405282 ns     44384203 ns           16 
bytes_per_second=1.67866Gi/s items_per_second=225.305M/s
   my_double_verion/1000           2039 ns         2039 ns       342934 
bytes_per_second=7.30777Gi/s items_per_second=490.416M/s
   my_double_verion/10000         26622 ns        26621 ns        25899 
bytes_per_second=5.59754Gi/s items_per_second=375.644M/s
   my_double_verion/100000       444275 ns       444225 ns         1542 
bytes_per_second=3.35442Gi/s items_per_second=225.111M/s
   my_double_verion/1000000     4607614 ns      4606397 ns          150 
bytes_per_second=3.23488Gi/s items_per_second=217.089M/s
   my_double_verion/10000000   47061217 ns     47061698 ns           15 
bytes_per_second=3.1663Gi/s items_per_second=212.487M/s
   RUNNING: ../my_version/Benchmark_ULP --benchmark_filter=arrow.* 
--benchmark_out=/tmp/tmp0jihl778
   2025-11-21T09:56:28+03:30
   Running ../my_version/Benchmark_ULP
   Run on (12 X 3903.04 MHz CPU s)
   CPU Caches:
     L1 Data 48 KiB (x6)
     L1 Instruction 32 KiB (x6)
     L2 Unified 1280 KiB (x6)
     L3 Unified 12288 KiB (x1)
   Load Average: 0.98, 1.00, 0.99
   
----------------------------------------------------------------------------------------
   Benchmark                              Time             CPU   Iterations 
UserCounters...
   
----------------------------------------------------------------------------------------
   arrow_float_version/1000            8195 ns         8194 ns        85350 
bytes_per_second=931.117Mi/s items_per_second=122.043M/s
   arrow_float_version/10000         127369 ns       127346 ns         5439 
bytes_per_second=599.106Mi/s items_per_second=78.526M/s
   arrow_float_version/100000       1349496 ns      1349150 ns          517 
bytes_per_second=565.496Mi/s items_per_second=74.1207M/s
   arrow_float_version/1000000     13905260 ns     13901427 ns           51 
bytes_per_second=548.821Mi/s items_per_second=71.9351M/s
   arrow_float_version/10000000   137240593 ns    137240576 ns            5 
bytes_per_second=555.914Mi/s items_per_second=72.8647M/s
   arrow_double_version/1000           8865 ns         8863 ns        80202 
bytes_per_second=1.68125Gi/s items_per_second=112.826M/s
   arrow_double_version/10000        131861 ns       131836 ns         5296 
bytes_per_second=1.13028Gi/s items_per_second=75.8519M/s
   arrow_double_version/100000      1429648 ns      1429675 ns          483 
bytes_per_second=1.04228Gi/s items_per_second=69.946M/s
   arrow_double_version/1000000    15052944 ns     15047883 ns           47 
bytes_per_second=1014.02Mi/s items_per_second=66.4545M/s
   arrow_double_version/10000000  150475448 ns    150468935 ns            4 
bytes_per_second=1014.08Mi/s items_per_second=66.4589M/s
   Comparing my.* to arrow.* (from ../my_version/Benchmark_ULP)
   Benchmark                            Time             CPU      Time Old      
Time New       CPU Old       CPU New
   
-----------------------------------------------------------------------------------------------------------------
   [my.* vs. arrow.*]                +3.1050         +3.1042          1996      
    8195          1996          8194
   [my.* vs. arrow.*]                +4.1066         +4.1059         24942      
  127369         24941        127346
   [my.* vs. arrow.*]                +2.2827         +2.2818        411094      
 1349496        411095       1349150
   [my.* vs. arrow.*]                +2.2451         +2.2445       4284970      
13905260       4284597      13901427
   [my.* vs. arrow.*]                +2.0906         +2.0921      44405282     
137240593      44384203     137240576
   [my.* vs. arrow.*]                +3.3472         +3.3466          2039      
    8865          2039          8863
   [my.* vs. arrow.*]                +3.9530         +3.9523         26622      
  131861         26621        131836
   [my.* vs. arrow.*]                +2.2179         +2.2184        444275      
 1429648        444225       1429675
   [my.* vs. arrow.*]                +2.2670         +2.2667       4607614      
15052944       4606397      15047883
   [my.* vs. arrow.*]                +2.1974         +2.1973      47061217     
150475448      47061698     150468935
   [my.* vs. arrow.*]_pvalue          0.4727          0.4727      U Test, 
Repetitions: 10 vs 10
   OVERALL_GEOMEAN                   +2.7141         +2.7139             0      
       0             0             0
   ```
   
   the result in gcc 13.3 is
   
   ```
   RUNNING: ../my_version/Benchmark_ULP --benchmark_filter=my.* 
--benchmark_out=/tmp/tmpripsdal3
   2025-11-21T10:42:42+03:30
   Running ../my_version/Benchmark_ULP
   Run on (12 X 2026.34 MHz CPU s)
   CPU Caches:
     L1 Data 48 KiB (x6)
     L1 Instruction 32 KiB (x6)
     L2 Unified 1280 KiB (x6)
     L3 Unified 12288 KiB (x1)
   Load Average: 0.82, 0.81, 1.02
   
------------------------------------------------------------------------------------
   Benchmark                          Time             CPU   Iterations 
UserCounters...
   
------------------------------------------------------------------------------------
   my_float_version/1000           1874 ns         1874 ns       361661 
bytes_per_second=3.97566Gi/s items_per_second=533.605M/s
   my_float_version/10000         67258 ns        67253 ns        10104 
bytes_per_second=1.10784Gi/s items_per_second=148.692M/s
   my_float_version/100000       832701 ns       832614 ns          790 
bytes_per_second=916.319Mi/s items_per_second=120.104M/s
   my_float_version/1000000     8801636 ns      8799939 ns           82 
bytes_per_second=866.983Mi/s items_per_second=113.637M/s
   my_float_version/10000000   87679986 ns     87640110 ns            8 
bytes_per_second=870.537Mi/s items_per_second=114.103M/s
   my_double_verion/1000           2061 ns         2060 ns       337844 
bytes_per_second=7.23212Gi/s items_per_second=485.339M/s
   my_double_verion/10000         57709 ns        57672 ns        11730 
bytes_per_second=2.58376Gi/s items_per_second=173.393M/s
   my_double_verion/100000       829816 ns       829136 ns          853 
bytes_per_second=1.79719Gi/s items_per_second=120.607M/s
   my_double_verion/1000000     8788251 ns      8783822 ns           80 
bytes_per_second=1.69643Gi/s items_per_second=113.846M/s
   my_double_verion/10000000   87873772 ns     87843363 ns            8 
bytes_per_second=1.69633Gi/s items_per_second=113.839M/s
   RUNNING: ../my_version/Benchmark_ULP --benchmark_filter=arrow.* 
--benchmark_out=/tmp/tmpb6xqe856
   2025-11-21T10:42:52+03:30
   Running ../my_version/Benchmark_ULP
   Run on (12 X 2198.99 MHz CPU s)
   CPU Caches:
     L1 Data 48 KiB (x6)
     L1 Instruction 32 KiB (x6)
     L2 Unified 1280 KiB (x6)
     L3 Unified 12288 KiB (x1)
   Load Average: 0.93, 0.83, 1.03
   
----------------------------------------------------------------------------------------
   Benchmark                              Time             CPU   Iterations 
UserCounters...
   
----------------------------------------------------------------------------------------
   arrow_float_version/1000            8104 ns         8104 ns        83413 
bytes_per_second=941.477Mi/s items_per_second=123.401M/s
   arrow_float_version/10000         163596 ns       163530 ns         4382 
bytes_per_second=466.543Mi/s items_per_second=61.1507M/s
   arrow_float_version/100000       1694072 ns      1693670 ns          407 
bytes_per_second=450.465Mi/s items_per_second=59.0434M/s
   arrow_float_version/1000000     17004234 ns     17003336 ns           41 
bytes_per_second=448.7Mi/s items_per_second=58.812M/s
   arrow_float_version/10000000   169239383 ns    169224726 ns            4 
bytes_per_second=450.844Mi/s items_per_second=59.093M/s
   arrow_double_version/1000           9198 ns         9195 ns        75753 
bytes_per_second=1.62055Gi/s items_per_second=108.753M/s
   arrow_double_version/10000        156480 ns       156431 ns         4554 
bytes_per_second=975.434Mi/s items_per_second=63.926M/s
   arrow_double_version/100000      1686394 ns      1685713 ns          419 
bytes_per_second=905.183Mi/s items_per_second=59.3221M/s
   arrow_double_version/1000000    17405253 ns     17403154 ns           40 
bytes_per_second=876.783Mi/s items_per_second=57.4608M/s
   arrow_double_version/10000000  175229657 ns    175218073 ns            4 
bytes_per_second=870.846Mi/s items_per_second=57.0717M/s
   Comparing my.* to arrow.* (from ../my_version/Benchmark_ULP)
   Benchmark                            Time             CPU      Time Old      
Time New       CPU Old       CPU New
   
-----------------------------------------------------------------------------------------------------------------
   [my.* vs. arrow.*]                +3.3242         +3.3241          1874      
    8104          1874          8104
   [my.* vs. arrow.*]                +1.4324         +1.4316         67258      
  163596         67253        163530
   [my.* vs. arrow.*]                +1.0344         +1.0342        832701      
 1694072        832614       1693670
   [my.* vs. arrow.*]                +0.9319         +0.9322       8801636      
17004234       8799939      17003336
   [my.* vs. arrow.*]                +0.9302         +0.9309      87679986     
169239383      87640110     169224726
   [my.* vs. arrow.*]                +3.4620         +3.4627          2061      
    9198          2060          9195
   [my.* vs. arrow.*]                +1.7115         +1.7124         57709      
  156480         57672        156431
   [my.* vs. arrow.*]                +1.0323         +1.0331        829816      
 1686394        829136       1685713
   [my.* vs. arrow.*]                +0.9805         +0.9813       8788251      
17405253       8783822      17403154
   [my.* vs. arrow.*]                +0.9941         +0.9947      87873772     
175229657      87843363     175218073
   [my.* vs. arrow.*]_pvalue          0.4727          0.4727      U Test, 
Repetitions: 10 vs 10
   OVERALL_GEOMEAN                   +1.4486         +1.4490             0      
       0             0             0  
   
   
   ``` 
   3-Considering these factors—the existing implementation contains a bug and 
is more than twice as slow under Clang, although the slowdown may not be 
significant—should we retain the current implementation or adopt the new one 
for float16 and enable it in Arrow’s equality method?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to