[GitHub] [beam] janotav commented on issue #6303: [BEAM-690] Backoff in the DirectRunner if no work is available

GitHub Thu, 06 Sep 2018 13:19:38 -0700

Thanks for the feedback guys. To be honest I'm no longer convinced this is the 
right thing to do. It does indeed decrease the CPU consumption significantly, 
however, at least in our case it is not enough. It turns out that even if the 
pipeline is completely empty, the driver goes


THROTTLE, THROTTLE, CONTINUE, THROTTLE, THROTTLE, CONTINUE, ... and so on ...

So effectively the active loop becomes loop with 15 ms sleep (average of 10 and 
20 ms). Because the code performed in the active phase is itself non-trivial, 
this still puts easily measurable load on the CPU. I was able to achieve some 
further minor improvements by doing some low-level changes in how the driver 
works with collections, but it became obvious that (at least in my quite 
specific use-case) this leads nowhere.

I was able to come up with an alternative (applicative) solution that simply 
blocks the DirectRunner threads when the pipeline is empty and only resumes the 
DirectRunner loop when new data enter the pipeline. 

I'll keep on thinking about this for a while yet and then probably close this 
PR unless I figure out how to make it really useful...




[ Full content available at: https://github.com/apache/beam/pull/6303 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [beam] janotav commented on issue #6303: [BEAM-690] Backoff in the DirectRunner if no work is available

Reply via email to