guyuqi opened a new pull request #7121:
URL: https://github.com/apache/arrow/pull/7121
The patch is to implement ValidateAscii function.
The benchmark and test facilities are also added.
### The benchmark on x86:
**Original:**
```
Run on (20 X 3000 MHz CPU s)
CPU Caches:
L1 Data 32K (x10)
L1 Instruction 32K (x10)
L2 Unified 256K (x10)
L3 Unified 25600K (x1)
Load Average: 4.79, 5.63, 2.68
-----------------------------------------------------------------------------------
Benchmark Time CPU Iterations
UserCounters...
-----------------------------------------------------------------------------------
ValidateTinyAscii 2.39 ns 2.39 ns 276349157
bytes_per_second=3.8964G/s
ValidateTinyNonAscii 8.21 ns 8.21 ns 85421905
bytes_per_second=1.24781G/s
ValidateSmallAscii 10.9 ns 10.9 ns 65497418
bytes_per_second=11.6721G/s
ValidateSmallAlmostAscii 46.1 ns 46.1 ns 15204522
bytes_per_second=2.99142G/s
ValidateSmallNonAscii 84.6 ns 84.6 ns 8303767
bytes_per_second=1.47429G/s
ValidateLargeAscii 4997 ns 4997 ns 136960
bytes_per_second=18.6385G/s
ValidateLargeAlmostAscii 30575 ns 30575 ns 22651
bytes_per_second=3.04752G/s
ValidateLargeNonAscii 73714 ns 73713 ns 9385
bytes_per_second=1.26467G/s
```
**Enable simd**
```
Run on (20 X 3000 MHz CPU s)
CPU Caches:
L1 Data 32K (x10)
L1 Instruction 32K (x10)
L2 Unified 256K (x10)
L3 Unified 25600K (x1)
Load Average: 4.79, 5.63, 2.68
-----------------------------------------------------------------------------------
Benchmark Time CPU Iterations
UserCounters...
-----------------------------------------------------------------------------------
ValidateTinyAscii 6.36 ns 6.36 ns 100259371
bytes_per_second=1.46438G/s
ValidateTinyNonAscii 11.3 ns 11.3 ns 61604638
bytes_per_second=926.575M/s
ValidateSmallAscii 9.74 ns 9.74 ns 71987411
bytes_per_second=13.0987G/s
ValidateSmallAlmostAscii 51.1 ns 51.1 ns 13677942
bytes_per_second=2.69774G/s
ValidateSmallNonAscii 84.5 ns 84.5 ns 8135065
bytes_per_second=1.47735G/s
ValidateLargeAscii 2363 ns 2363 ns 298863
bytes_per_second=39.4107G/s
ValidateLargeAlmostAscii 31006 ns 31006 ns 22642
bytes_per_second=3.00508G/s
ValidateLargeNonAscii 76222 ns 76222 ns 8703
bytes_per_second=1.22305G/s
```
### The benchmark on Arm64
**Original:**
```
Run on (46 X 2600 MHz CPU s)
Load Average: 0.19, 0.66, 1.73
***WARNING*** CPU scaling is enabled, the benchmark real time measurements
may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------------
Benchmark Time CPU Iterations
UserCounters...
-----------------------------------------------------------------------------------
ValidateTinyAscii 14.7 ns 14.7 ns 47461698
bytes_per_second=646.682M/s
ValidateTinyNonAscii 48.0 ns 48.0 ns 14576733
bytes_per_second=218.46M/s
ValidateSmallAscii 109 ns 109 ns 6404395
bytes_per_second=1.1667G/s
ValidateSmallAlmostAscii 275 ns 275 ns 2544737
bytes_per_second=513.185M/s
ValidateSmallNonAscii 555 ns 555 ns 1261830
bytes_per_second=230.408M/s
ValidateLargeAscii 78511 ns 78502 ns 8915
bytes_per_second=1.18649G/s
ValidateLargeAlmostAscii 179928 ns 179907 ns 3891
bytes_per_second=530.347M/s
ValidateLargeNonAscii 415107 ns 415058 ns 1686
bytes_per_second=229.994M/s
```
**Enable simd**
```
Run on (46 X 2600 MHz CPU s)
Load Average: 0.19, 0.66, 1.73
***WARNING*** CPU scaling is enabled, the benchmark real time measurements
may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------------
Benchmark Time CPU Iterations
UserCounters...
-----------------------------------------------------------------------------------
ValidateTinyAscii 65.9 ns 65.9 ns 10597823
bytes_per_second=144.749M/s
ValidateTinyNonAscii 48.0 ns 48.0 ns 14584553
bytes_per_second=218.732M/s
ValidateSmallAscii 83.7 ns 83.7 ns 8367346
bytes_per_second=1.52478G/s
ValidateSmallAlmostAscii 275 ns 275 ns 2542186
bytes_per_second=512.498M/s
ValidateSmallNonAscii 555 ns 555 ns 1261774
bytes_per_second=230.301M/s
ValidateLargeAscii 3109 ns 3108 ns 225186
bytes_per_second=29.9637G/s
ValidateLargeAlmostAscii 179998 ns 179974 ns 3889
bytes_per_second=530.149M/s
ValidateLargeNonAscii 414228 ns 414181 ns 1691
bytes_per_second=230.481M/s
```
`ValidateLargeAscii` case will get performance boost when leveraging simd:
- x86: `18.6385G/s -> 39.4107G/s`
- Arm64: `1.18649G/s -> 29.9637G/s`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]