damccorm opened a new issue, #20496:
URL: https://github.com/apache/beam/issues/20496

   We have implemented some custom IO classes based on 
UnboundedReader/UnboundedSource. These work as expected, but while doing this I 
noticed a few things that didn't seem to be well documented and I'm not sure if 
they behave as would be anticipated.
   
   With the direct runner, when advance returns false repeatedly it appears as 
though direct runner will apply an increasing backoff to repeated calls to 
advance until it returns true, at which point the backoff is reset. This seems 
to be what I'd expect.
   
   However when the same code is used with Dataflow, advance will be called 
multiple times a second for a single given UnboundedSource instance with no 
backoff continuously. With more then one instance/worker this can start to 
produce additional CPU load.
   
   I'm a bit unclear what the right way to do this is, for example should you 
sleep in advance? I assume not, but it would be great if there was 
documentation around this interface, especially around the differing behavior 
of the various runners here and what the right way to implement this is to 
ensure efficient resource usage when no events are available from the 
underlying source.
   
    
   
   Imported from Jira 
[BEAM-10503](https://issues.apache.org/jira/browse/BEAM-10503). Original Jira 
may contain additional context.
   Reported by: ameihm.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to