GitHub user jkff opened a pull request:
https://github.com/apache/beam/pull/4124
[BEAM-3169] Fixes a data loss bug in WriteFiles when used with fire-once
triggers
https://issues.apache.org/jira/browse/BEAM-3169
This required a bit of twiddling with shard assignment logic too. The gist
of the change is changing the pre-finalize GBK to Reshuffle. I audited all
other usages of GBK in the SDK and it appears that only this one is buggy:
others either explicitly set a repeated trigger before applying the GBK, or are
directly applied to the user's input and the user's trigger firing behavior is
WAI.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkff/incubator-beam write-files-data-loss
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/4124.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4124
----
commit 17c4af7c5a3a9475a284afd4fe413cb57016e7e5
Author: Eugene Kirpichov <[email protected]>
Date: 2017-11-13T19:09:01Z
Slightly simplifies fixed sharding
commit 3926abf84c681f3b6918a3d691ca2cb00a2e40e6
Author: Eugene Kirpichov <[email protected]>
Date: 2017-11-13T19:58:11Z
More clear and consistent shard number assignment logic
commit e52d3d8d49d6ff2b2b767f8069e1f71f8c42c9ae
Author: Eugene Kirpichov <[email protected]>
Date: 2017-11-13T20:28:34Z
Materializes file results via Reshuffle rather than GBK
commit 1a625ecf313b6ff03311464d40a5515736cbbdd7
Author: Eugene Kirpichov <[email protected]>
Date: 2017-11-13T23:55:44Z
Adds test for WriteFiles with a fire-once trigger
commit cba9ca163c621c1b187965226748596e2f5f8600
Author: Eugene Kirpichov <[email protected]>
Date: 2017-11-14T00:04:03Z
makes checkstyle happy
----
---