[
https://issues.apache.org/jira/browse/BEAM-690?focusedWorklogId=141928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-141928
]
ASF GitHub Bot logged work on BEAM-690:
---------------------------------------
Author: ASF GitHub Bot
Created on: 06/Sep/18 20:19
Start Date: 06/Sep/18 20:19
Worklog Time Spent: 10m
Work Description: janotav commented on issue #6303: [BEAM-690] Backoff in
the DirectRunner if no work is available
URL: https://github.com/apache/beam/pull/6303#issuecomment-419227607
Thanks for the feedback guys. To be honest I'm no longer convinced this is
the right thing to do. It does indeed decrease the CPU consumption
significantly, however, at least in our case it is not enough. It turns out
that even if the pipeline is completely empty, the driver goes
THROTTLE, THROTTLE, CONTINUE, THROTTLE, THROTTLE, CONTINUE, ... and so on ...
So effectively the active loop becomes loop with 15 ms sleep (average of 10
and 20 ms). Because the code performed in the active phase is itself
non-trivial, this still puts easily measurable load on the CPU. I was able to
achieve some further minor improvements by doing some low-level changes in how
the driver works with collections, but it became obvious that (at least in my
quite specific use-case) this leads nowhere.
I was able to come up with an alternative (applicative) solution that simply
blocks the DirectRunner threads when the pipeline is empty and only resumes the
DirectRunner loop when new data enter the pipeline.
I'll keep on thinking about this for a while yet and then probably close
this PR unless I figure out how to make it really useful...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 141928)
Time Spent: 0.5h (was: 20m)
> Backoff in the DirectRunner Monitor if no work is Available
> -----------------------------------------------------------
>
> Key: BEAM-690
> URL: https://issues.apache.org/jira/browse/BEAM-690
> Project: Beam
> Issue Type: Bug
> Components: runner-direct
> Reporter: Thomas Groh
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> When a Pipeline has no elements available to process, the Monitor Runnable
> will be repeatedly scheduled. Given that there is no work to be done, this
> will loop over the steps in the transform looking for timers, and prompt the
> sources to perform additional work, even though there is no work to be done.
> This consumes the entirety of a single core.
> Add a bounded backoff to rescheduling the monitor runnable if no work has
> been done since it last ran. This will reduce resource consumption on
> low-throughput Pipelines.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)