mwylde opened a new pull request, #7157: URL: https://github.com/apache/arrow-rs/pull/7157
# Which issue does this PR close? Closes #7156 # Rationale for this change Described in the issue # What changes are included in this PR? This PR is made up of three changes, as separate commits. The performance impact of each change is summarized here: | Benchmark | BufIter Change (%) | memchr2 Change (%) | simdutf8 Change (%) | BufIter Time (µs) | memchr2 Time (µs) | simdutf8 Time (µs) | |--------------------------|--------------------|--------------------|--------------------|-------------------|-------------------|--------------------| | logs_json | -13.49% | -12.61% | -0.83% | 582.91 | 516.77 | 510.75 | | logs_pretty_json | -20.63% | -10.77% | -1.22% | 604.53 | 535.28 | 533.00 | | nexmark_json | -30.01% | -16.01% | -0.42% | 730.18 | 613.23 | 600.53 | | nexmark_pretty_json | -21.35% | -15.76% | -1.09% | 754.84 | 636.66 | 623.56 | | nexmark_bids_json | -26.64% | -22.20% | -3.01% | 508.01 | 396.75 | 382.77 | | nexmark_bids_pretty_json | -26.26% | -22.16% | -2.64% | 531.44 | 415.03 | 402.55 | | tweets_json | -20.36% | -18.97% | -14.85% | 688.13 | 594.75 | 515.75 | | tweets_pretty_json | -22.61% | -15.10% | -11.04% | 749.34 | 642.19 | 563.66 | | **Average** | **-22.04%** | **-16.57%** | **-4.64%** | **643.17** | **543.21** | **566.57** | * 80cd0b9 (BufIter): replaces the wrapped iterator in BufIter with a slice and an offset, allowing more efficient operations; this is a straightforward change that significantly improves performance * 3245fff (memchr2): uses the [memchr library](https://github.com/BurntSushi/memchr) to speed up searches for string ends; this is also a significant improvement but adds an additional dependency (although one that is already used in arrow-string) * 9789442 (simdutf8) a more modest improvement that reduces the cost of utf8 validation via the simdutf8 library; this is also an additional dependency, although it's used in the parquet crate and has been suggested for other uses in #7014 If the changes that add dependencies are not desired they can be backed out of the PR. # Are there any user-facing changes? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
