Antony Mayi created BEAM-2398:
---------------------------------
Summary: Increasing latency within DirectRunner caused by
cumulated TransformWatermarks
Key: BEAM-2398
URL: https://issues.apache.org/jira/browse/BEAM-2398
Project: Beam
Issue Type: Bug
Components: runner-direct
Affects Versions: 2.0.0
Reporter: Antony Mayi
Assignee: Thomas Groh
Attachments: LatencyTest.java
Over the time the end-to-end latency of a pipeline running on DirectRunner is
significantly increasing.
This is caused by ever growing sets of:
* {{WatermarkManager.TransformWatermarks.inputWatermark.pendingElements}}
*
{{WatermarkManager.TransformWatermarks.synchronizedProcessingInputWatermark.pendingBundles}}
That means calls to {{WatermarkManager.TransformWatermarks.refresh()}} which
need to iterate through that collections take longer and longer and the latency
is growing.
I believe it is the line {{WaterMark.updatePending()}} line:
{quote}
if (input != null) {
// Add the unprocessed inputs
completedTransform.addPending(result.getUnprocessedInputs());
{quote}
that's adding the items that are never removed.
See attached demo code showing the increasing latency.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)