[
https://issues.apache.org/jira/browse/BEAM-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179820#comment-17179820
]
Beam JIRA Bot commented on BEAM-3353:
-------------------------------------
This issue is P2 but has been unassigned without any comment for 60 days so it
has been labeled "stale-P2". If this issue is still affecting you, we care!
Please comment and remove the label. Otherwise, in 14 days the issue will be
moved to P3.
Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed
explanation of what these priorities mean.
> Prohibit stacked GBKs with accumulating mode
> --------------------------------------------
>
> Key: BEAM-3353
> URL: https://issues.apache.org/jira/browse/BEAM-3353
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core, sdk-py-core
> Reporter: Eugene Kirpichov
> Priority: P2
> Labels: stale-P2
>
> The following test https://github.com/apache/beam/pull/4239 demonstrates that
> stacked GBKs with accumulating mode are unsafe, the same way that stacked
> GBKs with merging windows are unsafe.
> In particular, in the pipeline: input -> (gbk onto N keys) -> ungroup -> (gbk
> onto 1 key) -> ungroup, e.g. suppose the first gbk receives "a" and then "b";
> it will emit "a" and then "a","b" - then the second gbk will emit "a" and
> then "a","a","b" which is meaningless. With combine instead of GBK, it leads
> to double-counting.
> There are cases where accumulation propagated through stacked aggregation can
> be desirable, but having it propagate by default is definitely the wrong
> thing to do. Silently changing it to discarding is likely also the wrong
> thing to do. So, we should reset the windowing strategy and force the user to
> specify accumulating mode explicitly if they would like to.
> All pipelines using this currently are computing meaningless results, so
> rejecting them should not be considered a breaking change. However, we should
> still find out whether there are a lot of such pipelines or not.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)