Antony Mayi created BEAM-2398:
---------------------------------

             Summary: Increasing latency within DirectRunner caused by 
cumulated TransformWatermarks
                 Key: BEAM-2398
                 URL: https://issues.apache.org/jira/browse/BEAM-2398
             Project: Beam
          Issue Type: Bug
          Components: runner-direct
    Affects Versions: 2.0.0
            Reporter: Antony Mayi
            Assignee: Thomas Groh
         Attachments: LatencyTest.java

Over the time the end-to-end latency of a pipeline running on DirectRunner is 
significantly increasing.

This is caused by ever growing sets of:
* {{WatermarkManager.TransformWatermarks.inputWatermark.pendingElements}}
* 
{{WatermarkManager.TransformWatermarks.synchronizedProcessingInputWatermark.pendingBundles}}

That means calls to {{WatermarkManager.TransformWatermarks.refresh()}} which 
need to iterate through that collections take longer and longer and the latency 
is growing.

I believe it is the line {{WaterMark.updatePending()}} line:

{quote}
    if (input != null) {
      // Add the unprocessed inputs
      completedTransform.addPending(result.getUnprocessedInputs());
{quote}
that's adding the items that are never removed.

See attached demo code showing the increasing latency.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to