lukecwik opened a new pull request, #22345:
URL: https://github.com/apache/beam/pull/22345
This leverages the fact that all encoding is done from a thread safe manner
allowing us to drop the syncrhonization that ByteString.Output adds and it
also optimizes the max chunk size based upon performance measurements and
the ratio for how full a byte[] should be for the final copy vs concatenate
decision.
Below are the results of several scenarios in which we compare the protobuf
vs new solution which mostly have a large perf improvement for tiny writes
and
still noticeable improvements for larger writes:
```
Benchmark
Mode Cnt Score Error Units
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewLargeWrites
thrpt 25 1149267.797 ± 15366.677 ops/s
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewMixedWritesWithReuse
thrpt 25 832816.697 ± 4236.341 ops/s
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewMixedWritesWithoutReuse
thrpt 25 916629.194 ± 5669.323 ops/s
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewSmallWrites
thrpt 25 14175167.566 ± 88540.030 ops/s
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewTinyWrites
thrpt 25 22471597.238 ± 186098.311 ops/s
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyLargeWrites
thrpt 25 610.218 ± 5.019 ops/s
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyMixedWritesWithReuse
thrpt 25 484.413 ± 35.194 ops/s
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyMixedWritesWithoutReuse
thrpt 25 559.983 ± 6.228 ops/s
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManySmallWrites
thrpt 25 10969.839 ± 88.199 ops/s
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyTinyWrites
thrpt 25 40822.925 ± 191.402 ops/s
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewLargeWrites
thrpt 25 1167673.532 ± 9747.507 ops/s
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewMixedWritesWithReuse
thrpt 25 1576528.242 ± 15883.083 ops/s
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewMixedWritesWithoutReuse
thrpt 25 1009766.655 ± 8700.273 ops/s
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewSmallWrites
thrpt 25 33293140.679 ± 233693.771 ops/s
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewTinyWrites
thrpt 25 86841328.763 ± 729741.769 ops/s
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyLargeWrites
thrpt 25 1058.150 ± 15.192 ops/s
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyMixedWritesWithReuse
thrpt 25 937.249 ± 9.264 ops/s
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyMixedWritesWithoutReuse
thrpt 25 959.671 ± 13.989 ops/s
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManySmallWrites
thrpt 25 12601.065 ± 92.375 ops/s
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyTinyWrites
thrpt 25 65277.229 ± 3795.676 ops/s
```
The copy vs concatenate numbers come from these results which show that 256k
seems to
be a pretty good chunk size since the larger chunks seem to cost more per
byte to allocate.
They also show at what threshold should we currently copy the bytes vs
concatenate a partially
full buffer and allocate a new one:
```
Benchmark
newSize copyVsNew Mode Cnt Score
Error Units
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 512/1024 thrpt 25 19744209.563 ±
148287.185 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 640/1024 thrpt 25 15738981.338 ±
103684.000 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 768/1024 thrpt 25 12778194.652 ±
202212.679 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 896/1024 thrpt 25 11053602.109 ±
103120.446 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 4096/8192 thrpt 25 2961435.128 ±
25895.802 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 5120/8192 thrpt 25 2498594.030 ±
26051.674 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 6144/8192 thrpt 25 2173161.031 ±
20014.569 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 7168/8192 thrpt 25 1917545.913 ±
21470.719 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 20480/65536 thrpt 25 537872.049 ±
5525.024 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 24576/65536 thrpt 25 371312.042 ±
4450.715 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 28672/65536 thrpt 25 306027.442 ±
2830.503 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 32768/65536 thrpt 25 263933.096 ±
1833.603 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 131072/262144 thrpt 25 80224.558 ±
1192.994 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 163840/262144 thrpt 25 65311.283 ±
775.920 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 196608/262144 thrpt 25 54510.877 ±
797.775 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 229376/262144 thrpt 25 46808.185 ±
515.039 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 524288/1048576 thrpt 25 17729.937 ±
301.199 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 655360/1048576 thrpt 25 12996.953 ±
228.552 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 786432/1048576 thrpt 25 11383.122 ±
384.086 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray
N/A 917504/1048576 thrpt 25 9915.318 ±
285.995 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray
1024 N/A thrpt 25 10023631.563 ±
61317.055 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray
8192 N/A thrpt 25 2109120.041 ±
17482.682 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray
65536 N/A thrpt 25 318492.630 ±
3006.827 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray
262144 N/A thrpt 25 79228.892 ±
725.230 ops/s
ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray
1048576 N/A thrpt 25 13089.221 ±
73.535 ops/s
```
The difference is minor in the `ProcessBundleBenchmark` as there is not
enough data being passed around for it to make a major difference:
```
Before
Benchmark Mode Cnt Score
Error Units
ProcessBundleBenchmark.testLargeBundle thrpt 25 1156.159 ±
9.001 ops/s
ProcessBundleBenchmark.testTinyBundle thrpt 25 29641.444 ±
125.041 ops/s
After
Benchmark Mode Cnt Score
Error Units
ProcessBundleBenchmark.testLargeBundle thrpt 25 1168.977 ±
25.848 ops/s
ProcessBundleBenchmark.testTinyBundle thrpt 25 29664.783 ±
99.791 ops/s
```
**Please** add a meaningful description for your change here
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [ ] [**Choose
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and
mention them in a comment (`R: @username`).
- [ ] Mention the appropriate issue in your description (for example:
`addresses #123`), if applicable. This will automatically add a link to the
pull request in the issue. If you would like the issue to automatically close
on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
See the [Contributor Guide](https://beam.apache.org/contribute) for more
tips on [how to make review process
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
To check the build health, please visit
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more
information about GitHub Actions CI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]