GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/4124

    [BEAM-3169] Fixes a data loss bug in WriteFiles when used with fire-once 
triggers

    https://issues.apache.org/jira/browse/BEAM-3169
    
    This required a bit of twiddling with shard assignment logic too. The gist 
of the change is changing the pre-finalize GBK to Reshuffle. I audited all 
other usages of GBK in the SDK and it appears that only this one is buggy: 
others either explicitly set a repeated trigger before applying the GBK, or are 
directly applied to the user's input and the user's trigger firing behavior is 
WAI.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam write-files-data-loss

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/4124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4124
    
----
commit 17c4af7c5a3a9475a284afd4fe413cb57016e7e5
Author: Eugene Kirpichov <[email protected]>
Date:   2017-11-13T19:09:01Z

    Slightly simplifies fixed sharding

commit 3926abf84c681f3b6918a3d691ca2cb00a2e40e6
Author: Eugene Kirpichov <[email protected]>
Date:   2017-11-13T19:58:11Z

    More clear and consistent shard number assignment logic

commit e52d3d8d49d6ff2b2b767f8069e1f71f8c42c9ae
Author: Eugene Kirpichov <[email protected]>
Date:   2017-11-13T20:28:34Z

    Materializes file results via Reshuffle rather than GBK

commit 1a625ecf313b6ff03311464d40a5515736cbbdd7
Author: Eugene Kirpichov <[email protected]>
Date:   2017-11-13T23:55:44Z

    Adds test for WriteFiles with a fire-once trigger

commit cba9ca163c621c1b187965226748596e2f5f8600
Author: Eugene Kirpichov <[email protected]>
Date:   2017-11-14T00:04:03Z

    makes checkstyle happy

----


---

Reply via email to