pabloem commented on PR #24713:
URL: https://github.com/apache/beam/pull/24713#issuecomment-1360564970

   r: @lukecwik 
   
   instead of re-creating the connector task on every call of the process 
method, we cache it in a per-JVM cache manager and recover it the nnxt time we 
run on the worker.
   
   On DirectRunner, on a 5-minute test:
   - without this caching, we reach 108 connections to MySQL and then fail.
   - with caching, we reach 21 connections to MySQL and succeed.
   
   I've not tested this on Dataflow yet.
   
   I just thought of a potential failure scenario:
   - scheduled in worker A - consume range from 0 to 10
   - scheduled again in worker B - consume range 10 to 20
   - scheduled in worker A again - recover task from cache which reconsumes 
range from 10 to 20 again (bad :))
   
   TODO (@pabloem): Currently the offset tracker does not validate sequential 
offsets. I need to add that validation to ensure we discard already-consumed 
offsets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to