lukecwik opened a new pull request, #22345:
URL: https://github.com/apache/beam/pull/22345

   This leverages the fact that all encoding is done from a thread safe manner
   allowing us to drop the syncrhonization that ByteString.Output adds and it
   also optimizes the max chunk size based upon performance measurements and
   the ratio for how full a byte[] should be for the final copy vs concatenate
   decision.
   
   Below are the results of several scenarios in which we compare the protobuf
   vs new solution which mostly have a large perf improvement for tiny writes 
and
   still noticeable improvements for larger writes:
   ```
   Benchmark                                                                    
                   Mode  Cnt         Score        Error  Units
   
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewLargeWrites
               thrpt   25   1149267.797 ±  15366.677  ops/s
   
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewMixedWritesWithReuse
      thrpt   25    832816.697 ±   4236.341  ops/s
   
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewMixedWritesWithoutReuse
   thrpt   25    916629.194 ±   5669.323  ops/s
   
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewSmallWrites
               thrpt   25  14175167.566 ±  88540.030  ops/s
   
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewTinyWrites 
               thrpt   25  22471597.238 ± 186098.311  ops/s
   
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyLargeWrites
              thrpt   25       610.218 ±      5.019  ops/s
   
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyMixedWritesWithReuse
     thrpt   25       484.413 ±     35.194  ops/s
   
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyMixedWritesWithoutReuse
  thrpt   25       559.983 ±      6.228  ops/s
   
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManySmallWrites
              thrpt   25     10969.839 ±     88.199  ops/s
   
ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyTinyWrites
               thrpt   25     40822.925 ±    191.402  ops/s
   
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewLargeWrites 
               thrpt   25   1167673.532 ±   9747.507  ops/s
   
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewMixedWritesWithReuse
       thrpt   25   1576528.242 ±  15883.083  ops/s
   
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewMixedWritesWithoutReuse
    thrpt   25   1009766.655 ±   8700.273  ops/s
   
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewSmallWrites 
               thrpt   25  33293140.679 ± 233693.771  ops/s
   
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewTinyWrites  
               thrpt   25  86841328.763 ± 729741.769  ops/s
   
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyLargeWrites
               thrpt   25      1058.150 ±     15.192  ops/s
   
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyMixedWritesWithReuse
      thrpt   25       937.249 ±      9.264  ops/s
   
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyMixedWritesWithoutReuse
   thrpt   25       959.671 ±     13.989  ops/s
   
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManySmallWrites
               thrpt   25     12601.065 ±     92.375  ops/s
   
ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyTinyWrites 
               thrpt   25     65277.229 ±   3795.676  ops/s
   ```
   
   The copy vs concatenate numbers come from these results which show that 256k 
seems to
   be a pretty good chunk size since the larger chunks seem to cost more per 
byte to allocate.
   They also show at what threshold should we currently copy the bytes vs 
concatenate a partially
   full buffer and allocate a new one:
   ```
   Benchmark                                                                    
                          newSize       copyVsNew   Mode  Cnt         Score     
   Error  Units
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A        512/1024  thrpt   25  19744209.563 ± 
148287.185  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A        640/1024  thrpt   25  15738981.338 ± 
103684.000  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A        768/1024  thrpt   25  12778194.652 ± 
202212.679  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A        896/1024  thrpt   25  11053602.109 ± 
103120.446  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A       4096/8192  thrpt   25   2961435.128 ±  
25895.802  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A       5120/8192  thrpt   25   2498594.030 ±  
26051.674  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A       6144/8192  thrpt   25   2173161.031 ±  
20014.569  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A       7168/8192  thrpt   25   1917545.913 ±  
21470.719  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A     20480/65536  thrpt   25    537872.049 ±   
5525.024  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A     24576/65536  thrpt   25    371312.042 ±   
4450.715  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A     28672/65536  thrpt   25    306027.442 ±   
2830.503  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A     32768/65536  thrpt   25    263933.096 ±   
1833.603  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A   131072/262144  thrpt   25     80224.558 ±   
1192.994  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A   163840/262144  thrpt   25     65311.283 ±   
 775.920  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A   196608/262144  thrpt   25     54510.877 ±   
 797.775  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A   229376/262144  thrpt   25     46808.185 ±   
 515.039  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A  524288/1048576  thrpt   25     17729.937 ±   
 301.199  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A  655360/1048576  thrpt   25     12996.953 ±   
 228.552  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A  786432/1048576  thrpt   25     11383.122 ±   
 384.086  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray                      
                              N/A  917504/1048576  thrpt   25      9915.318 ±   
 285.995  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray                       
                             1024             N/A  thrpt   25  10023631.563 ±  
61317.055  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray                       
                             8192             N/A  thrpt   25   2109120.041 ±  
17482.682  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray                       
                            65536             N/A  thrpt   25    318492.630 ±   
3006.827  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray                       
                           262144             N/A  thrpt   25     79228.892 ±   
 725.230  ops/s
   ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray                       
                          1048576             N/A  thrpt   25     13089.221 ±   
  73.535  ops/s
   ```
   
   The difference is minor in the `ProcessBundleBenchmark` as there is not
   enough data being passed around for it to make a major difference:
   ```
   Before
   Benchmark                                        Mode  Cnt      Score     
Error  Units
   ProcessBundleBenchmark.testLargeBundle          thrpt   25   1156.159 ±   
9.001  ops/s
   ProcessBundleBenchmark.testTinyBundle           thrpt   25  29641.444 ± 
125.041  ops/s
   
   After
   Benchmark                                        Mode  Cnt      Score    
Error  Units
   ProcessBundleBenchmark.testLargeBundle          thrpt   25   1168.977 ± 
25.848  ops/s
   ProcessBundleBenchmark.testTinyBundle           thrpt   25  29664.783 ± 
99.791  ops/s
   ```
   
   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
    - [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to