[
https://issues.apache.org/jira/browse/BEAM-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Beam JIRA Bot updated BEAM-3353:
--------------------------------
Labels: stale-P2 (was: )
> Prohibit stacked GBKs with accumulating mode
> --------------------------------------------
>
> Key: BEAM-3353
> URL: https://issues.apache.org/jira/browse/BEAM-3353
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core, sdk-py-core
> Reporter: Eugene Kirpichov
> Priority: P2
> Labels: stale-P2
>
> The following test https://github.com/apache/beam/pull/4239 demonstrates that
> stacked GBKs with accumulating mode are unsafe, the same way that stacked
> GBKs with merging windows are unsafe.
> In particular, in the pipeline: input -> (gbk onto N keys) -> ungroup -> (gbk
> onto 1 key) -> ungroup, e.g. suppose the first gbk receives "a" and then "b";
> it will emit "a" and then "a","b" - then the second gbk will emit "a" and
> then "a","a","b" which is meaningless. With combine instead of GBK, it leads
> to double-counting.
> There are cases where accumulation propagated through stacked aggregation can
> be desirable, but having it propagate by default is definitely the wrong
> thing to do. Silently changing it to discarding is likely also the wrong
> thing to do. So, we should reset the windowing strategy and force the user to
> specify accumulating mode explicitly if they would like to.
> All pipelines using this currently are computing meaningless results, so
> rejecting them should not be considered a breaking change. However, we should
> still find out whether there are a lot of such pipelines or not.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)