GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/3890

    Introduces Reshuffle.viaRandomKey()

    It's a commonly used pattern for breaking fusion 
https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization
    
    viaRandomKey() only abstracts away the current commonly used pattern. It 
has the same caveats as using Reshuffle.of() directly - the semantics are 
technically not guaranteed by the Beam model, but it works in practice, and 
this is the pattern we keep recommending to users.
    
    The naming is deliberately operational rather than semantic, to emphasize 
that we don't have the semantics figured out, and the transform promises only 
that it expands into exactly the sequence "pair with random key, reshuffle, 
drop key". The goal of this change is just to reduce copy-paste.
    
    See prior discussion at 
https://lists.apache.org/thread.html/ac34c9ac665a8d9f67b0254015e44c59ea65ecc1360d4014b95d3b2e@%3Cdev.beam.apache.org%3E
    
    This change also converts several existing usages to use it, and adds 
another one in Match.
    
    R: @bjchambers 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam match-fusion-break

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3890.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3890
    
----
commit 0b4d801e4afb0be1463b196f419e1293265b68c1
Author: Eugene Kirpichov <kirpic...@google.com>
Date:   2017-09-22T22:24:36Z

    Introduces Reshuffle.viaRandomKey()
    
    It's a commonly used pattern for breaking fusion
    
https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization
    
    viaRandomKey() only abstracts away the current commonly used pattern.
    It has the same caveats as using Reshuffle.of() directly - the semantics
    are technically not guaranteed by the Beam model, but it works in
    practice, and this is the pattern we keep recommending to users.
    
    The naming is deliberately operational rather than semantic, to
    emphasize that we don't have the semantics figured out, and the
    transform promises only that it expands into exactly the sequence
    "pair with random key, reshuffle, drop key".
    The goal of this change is just to reduce copy-paste.
    
    See prior discussion at
    
https://lists.apache.org/thread.html/ac34c9ac665a8d9f67b0254015e44c59ea65ecc1360d4014b95d3b2e@%3Cdev.beam.apache.org%3E
    
    This change also converts several existing usages to use it, and adds 
another
    one in Match.

----


---

Reply via email to