guyuqi opened a new pull request #7121:
URL: https://github.com/apache/arrow/pull/7121


   The patch is to implement ValidateAscii function.
   The benchmark and test facilities are also added.
   
   ### The benchmark on x86:
   **Original:**
   ```
   Run on (20 X 3000 MHz CPU s)
   CPU Caches:
     L1 Data 32K (x10)
     L1 Instruction 32K (x10)
     L2 Unified 256K (x10)
     L3 Unified 25600K (x1)
   Load Average: 4.79, 5.63, 2.68
   
-----------------------------------------------------------------------------------
   Benchmark                         Time             CPU   Iterations 
UserCounters...
   
-----------------------------------------------------------------------------------
   ValidateTinyAscii              2.39 ns         2.39 ns    276349157 
bytes_per_second=3.8964G/s
   ValidateTinyNonAscii           8.21 ns         8.21 ns     85421905 
bytes_per_second=1.24781G/s
   ValidateSmallAscii             10.9 ns         10.9 ns     65497418 
bytes_per_second=11.6721G/s
   ValidateSmallAlmostAscii       46.1 ns         46.1 ns     15204522 
bytes_per_second=2.99142G/s
   ValidateSmallNonAscii          84.6 ns         84.6 ns      8303767 
bytes_per_second=1.47429G/s
   ValidateLargeAscii             4997 ns         4997 ns       136960 
bytes_per_second=18.6385G/s
   ValidateLargeAlmostAscii      30575 ns        30575 ns        22651 
bytes_per_second=3.04752G/s
   ValidateLargeNonAscii         73714 ns        73713 ns         9385 
bytes_per_second=1.26467G/s
   
   ```
   **Enable simd**
   ```
   Run on (20 X 3000 MHz CPU s)
   CPU Caches:
     L1 Data 32K (x10)
     L1 Instruction 32K (x10)
     L2 Unified 256K (x10)
     L3 Unified 25600K (x1)
   Load Average: 4.79, 5.63, 2.68
   
-----------------------------------------------------------------------------------
   Benchmark                         Time             CPU   Iterations 
UserCounters...
   
-----------------------------------------------------------------------------------
   ValidateTinyAscii              6.36 ns         6.36 ns    100259371 
bytes_per_second=1.46438G/s
   ValidateTinyNonAscii           11.3 ns         11.3 ns     61604638 
bytes_per_second=926.575M/s
   ValidateSmallAscii             9.74 ns         9.74 ns     71987411 
bytes_per_second=13.0987G/s
   ValidateSmallAlmostAscii       51.1 ns         51.1 ns     13677942 
bytes_per_second=2.69774G/s
   ValidateSmallNonAscii          84.5 ns         84.5 ns      8135065 
bytes_per_second=1.47735G/s
   ValidateLargeAscii             2363 ns         2363 ns       298863 
bytes_per_second=39.4107G/s
   ValidateLargeAlmostAscii      31006 ns        31006 ns        22642 
bytes_per_second=3.00508G/s
   ValidateLargeNonAscii         76222 ns        76222 ns         8703 
bytes_per_second=1.22305G/s
   ```
   
   
   ### The benchmark on Arm64
   **Original:**
   ```
   Run on (46 X 2600 MHz CPU s)
   Load Average: 0.19, 0.66, 1.73
   ***WARNING*** CPU scaling is enabled, the benchmark real time measurements 
may be noisy and will incur extra overhead.
   
-----------------------------------------------------------------------------------
   Benchmark                         Time             CPU   Iterations 
UserCounters...
   
-----------------------------------------------------------------------------------
   ValidateTinyAscii              14.7 ns         14.7 ns     47461698 
bytes_per_second=646.682M/s
   ValidateTinyNonAscii           48.0 ns         48.0 ns     14576733 
bytes_per_second=218.46M/s
   ValidateSmallAscii              109 ns          109 ns      6404395 
bytes_per_second=1.1667G/s
   ValidateSmallAlmostAscii        275 ns          275 ns      2544737 
bytes_per_second=513.185M/s
   ValidateSmallNonAscii           555 ns          555 ns      1261830 
bytes_per_second=230.408M/s
   ValidateLargeAscii            78511 ns        78502 ns         8915 
bytes_per_second=1.18649G/s
   ValidateLargeAlmostAscii     179928 ns       179907 ns         3891 
bytes_per_second=530.347M/s
   ValidateLargeNonAscii        415107 ns       415058 ns         1686 
bytes_per_second=229.994M/s
   ```
   **Enable simd**
   ```
   Run on (46 X 2600 MHz CPU s)
   Load Average: 0.19, 0.66, 1.73
   ***WARNING*** CPU scaling is enabled, the benchmark real time measurements 
may be noisy and will incur extra overhead.
   
-----------------------------------------------------------------------------------
   Benchmark                         Time             CPU   Iterations 
UserCounters...
   
-----------------------------------------------------------------------------------
   ValidateTinyAscii              65.9 ns         65.9 ns     10597823 
bytes_per_second=144.749M/s
   ValidateTinyNonAscii           48.0 ns         48.0 ns     14584553 
bytes_per_second=218.732M/s
   ValidateSmallAscii             83.7 ns         83.7 ns      8367346 
bytes_per_second=1.52478G/s
   ValidateSmallAlmostAscii        275 ns          275 ns      2542186 
bytes_per_second=512.498M/s
   ValidateSmallNonAscii           555 ns          555 ns      1261774 
bytes_per_second=230.301M/s
   ValidateLargeAscii             3109 ns         3108 ns       225186 
bytes_per_second=29.9637G/s
   ValidateLargeAlmostAscii     179998 ns       179974 ns         3889 
bytes_per_second=530.149M/s
   ValidateLargeNonAscii        414228 ns       414181 ns         1691 
bytes_per_second=230.481M/s
   ```
   
    `ValidateLargeAscii` case will get performance boost when leveraging simd: 
   
   - x86:      `18.6385G/s  -> 39.4107G/s`
   
   - Arm64:  `1.18649G/s -> 29.9637G/s`
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to