Eugene Kirpichov created BEAM-4242:
--------------------------------------
Summary: Wait.on() is O(n)
Key: BEAM-4242
URL: https://issues.apache.org/jira/browse/BEAM-4242
Project: Beam
Issue Type: Bug
Components: sdk-java-core
Reporter: Eugene Kirpichov
Assignee: Eugene Kirpichov
Wait.on() uses a NeverTrigger and a Sample.any(1) as an implementation detail.
Unfortunately, Sample.any() relies on combiner lifting for performance -
otherwise all values end up grouped onto the same worker which is not
acceptable if the signal PCollection is large.
Not all runners support combiner lifting at all; and even those that do (e.g.
Dataflow) don't guarantee it. In the case of a very large user's pipeline,
combiner lifting was not performed because it's only supported for
DefaultTrigger, but not for NeverTrigger.
This should be fixed by modifying Wait to not rely on combiner lifting for
performance, e.g. by a "manual" precombine (emit 1 value per bundle).
CC: [~mkhadikov]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)