GitHub user jkff opened a pull request: https://github.com/apache/beam/pull/3890
Introduces Reshuffle.viaRandomKey() It's a commonly used pattern for breaking fusion https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization viaRandomKey() only abstracts away the current commonly used pattern. It has the same caveats as using Reshuffle.of() directly - the semantics are technically not guaranteed by the Beam model, but it works in practice, and this is the pattern we keep recommending to users. The naming is deliberately operational rather than semantic, to emphasize that we don't have the semantics figured out, and the transform promises only that it expands into exactly the sequence "pair with random key, reshuffle, drop key". The goal of this change is just to reduce copy-paste. See prior discussion at https://lists.apache.org/thread.html/ac34c9ac665a8d9f67b0254015e44c59ea65ecc1360d4014b95d3b2e@%3Cdev.beam.apache.org%3E This change also converts several existing usages to use it, and adds another one in Match. R: @bjchambers You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkff/incubator-beam match-fusion-break Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3890.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3890 ---- commit 0b4d801e4afb0be1463b196f419e1293265b68c1 Author: Eugene Kirpichov <kirpic...@google.com> Date: 2017-09-22T22:24:36Z Introduces Reshuffle.viaRandomKey() It's a commonly used pattern for breaking fusion https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization viaRandomKey() only abstracts away the current commonly used pattern. It has the same caveats as using Reshuffle.of() directly - the semantics are technically not guaranteed by the Beam model, but it works in practice, and this is the pattern we keep recommending to users. The naming is deliberately operational rather than semantic, to emphasize that we don't have the semantics figured out, and the transform promises only that it expands into exactly the sequence "pair with random key, reshuffle, drop key". The goal of this change is just to reduce copy-paste. See prior discussion at https://lists.apache.org/thread.html/ac34c9ac665a8d9f67b0254015e44c59ea65ecc1360d4014b95d3b2e@%3Cdev.beam.apache.org%3E This change also converts several existing usages to use it, and adds another one in Match. ---- ---