sjvanrossum opened a new pull request, #29633:
URL: https://github.com/apache/beam/pull/29633
This change should improve varint encoding throughput by 1.5-6x in benchmark
tests.
The single byte case is handled by directly writing to the stream, which
can't be sped up.
All multi byte cases seem to benefit from using an intermediate buffer and
writing the entire buffer to the stream.
The buffer length is approximated, which improved throughput on size
calculation by 1.1x and overall encoding throughput by 1.3x. I am aware that
the bit shift loop in getLength is typically faster than constant time tricks
(it was hard to beat), but this approximation avoids a divide by 7 which
typically causes that to be slower.
I've benchmarked this using JMH with 3 warmups, 5 iterations, and 1 fork on
an Apple M1 Pro machine.
The benchmark constructs 2048 random longs and adds bits to produce an even
distribution of small and large integers.
- "All" encodes as is to a pre-constructed ByteArrayOutputStream
- "ExtraLarge" marks a bit to force only 10 bytes of output before encoding
- "Large" marks a bit to force only 4 bytes of output before encoding
- "Medium" marks a bit to force only 2 bytes of output before encoding
- "Small" marks a bit to force only 1 byte of output before encoding
```
Benchmark Mode Cnt Score
Error Units
VarIntBenchmark.benchmarkNewVarIntAll thrpt 5 320828.193 ±
20567.050 ops/s
VarIntBenchmark.benchmarkNewVarIntExtraLarge thrpt 5 243184.470 ±
31068.177 ops/s
VarIntBenchmark.benchmarkNewVarIntLarge thrpt 5 373675.092 ±
6412.076 ops/s
VarIntBenchmark.benchmarkNewVarIntMedium thrpt 5 338347.826 ±
18859.258 ops/s
VarIntBenchmark.benchmarkNewVarIntSmall thrpt 5 415559.304 ±
8169.027 ops/s
VarIntBenchmark.benchmarkOldVarIntAll thrpt 5 191424.762 ±
3954.473 ops/s
VarIntBenchmark.benchmarkOldVarIntExtraLarge thrpt 5 41411.167 ±
1322.300 ops/s
VarIntBenchmark.benchmarkOldVarIntLarge thrpt 5 102820.015 ±
5414.503 ops/s
VarIntBenchmark.benchmarkOldVarIntMedium thrpt 5 204669.983 ±
24894.096 ops/s
VarIntBenchmark.benchmarkOldVarIntSmall thrpt 5 415046.963 ±
9567.074 ops/s
```
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [ ] Mention the appropriate issue in your description (for example:
`addresses #123`), if applicable. This will automatically add a link to the
pull request in the issue. If you would like the issue to automatically close
on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
See the [Contributor Guide](https://beam.apache.org/contribute) for more
tips on [how to make review process
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
To check the build health, please visit
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more
information about GitHub Actions CI or the [workflows
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md)
to see a list of phrases to trigger workflows.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]