lukemin89 commented on issue #11263: [BEAM-9325] Override proper write method in UnownedOutputStream URL: https://github.com/apache/beam/pull/11263#issuecomment-606317443 Thanks for the review! I was looking to improve my 10+TB GBK steps and happened to find this. I just decided to fix it as it should be an effortless fix. I'm not sure what you mean by `if I've seen the performance`. In my pipeline, it would be difficult to find bottleneck even if I use Pprof. Especially, Dataflow PProf does not show much about GBK step. In case you meant if I ran benchmark, I just ran a short benchmark using `jmh` showing ``` Benchmark Mode Cnt Score Error Units Main.reuseWrite1kChunksByteArrayOutputStream thrpt 5 25487.387 ± 2144.764 ops/ms Main.reuseWrite1kChunksCustomNonSynchronousOutputStream thrpt 5 53692.972 ± 11194.620 ops/ms Main.reuseWrite1kChunksUnownedOutputStream thrpt 5 56.419 ± 1.125 ops/ms Main.reuseWrite1kChunksUnownedOutputStreamOverride thrpt 5 16714.851 ± 3091.762 ops/ms Benchmark Mode Cnt Score Error Units Main.reuseWrite256ChunksByteArrayOutputStream thrpt 5 43725.940 ± 6729.958 ops/ms Main.reuseWrite256ChunksCustomNonSynchronousOutputStream thrpt 5 70449.452 ± 11155.332 ops/ms Main.reuseWrite256ChunksUnownedOutputStream thrpt 5 221.629 ± 2.325 ops/ms Main.reuseWrite256ChunksUnownedOutputStreamOverride thrpt 5 24123.512 ± 5081.709 ops/ms Benchmark Mode Cnt Score Error Units Main.reuseWrite32ChunksByteArrayOutputStream thrpt 5 48755.580 ± 2890.195 ops/ms Main.reuseWrite32ChunksCustomNonSynchronousOutputStream thrpt 5 231230.842 ± 8340.905 ops/ms Main.reuseWrite32ChunksUnownedOutputStream thrpt 5 1780.829 ± 47.172 ops/ms Main.reuseWrite32ChunksUnownedOutputStreamOverride thrpt 5 26909.471 ± 381.218 ops/ms ``` The numbers are all over because it was on my laptop, but you can roughly see.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
