maartenbreddels opened a new pull request #7434:
URL: https://github.com/apache/arrow/pull/7434
Following up on #7418 I tried and benchmarked a different way for
* ascii_lower
* ascii_upper
Before (lower is similar):
```
--------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------
AsciiUpper_median 4922843 ns 4918961 ns 10
bytes_per_second=3.1457G/s items_per_second=213.17M/s
```
After:
```
--------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------
AsciiUpper_median 1391272 ns 1390014 ns 10
bytes_per_second=11.132G/s items_per_second=754.363M/s
```
This is a 3.7x speedup (on a AMD machine).
Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x
speedup for clang 9, 6.4x for GCC 9.2.
Also, the test is expanded a bit to include a non-ascii codepoint, to make
explicit it is fine to upper
or lower case a utf8 string. The non-overlap encoding of utf8 make this ok
(see section 2.5 of Unicode
Standard Core Specification v13.0).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]