[GitHub] [beam] dpcollins-google commented on a change in pull request #16901: Fix BoundedQueueExecutor and StreamingDataflowWorker to actually limit memory from windmill

GitBox Tue, 22 Feb 2022 12:20:23 -0800


dpcollins-google commented on a change in pull request #16901:
URL: https://github.com/apache/beam/pull/16901#discussion_r812324231




##########
File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
##########
@@ -195,6 +195,8 @@
   // retrieving extra work from Windmill without working on it, leading to 
better
   // prioritization / utilization.
   static final int MAX_WORK_UNITS_QUEUED = 100;
+  // Maximum bytes of WorkItems being processed in the work queue at a time.
+  static final int MAX_WORK_UNITS_BYTES = 500 << 20; // 500MB

Review comment:
       This would only throttle pipelines that are successfully operating with 
>500 MB outstanding at a time and recovering from this scenario faster than 
windmill could deliver new data to the worker. I'd expect this to be a very, 
very small percentage of pipelines, effectively only those whose work items 
were consistently about (NON_OOM_USABLE_MEMORY / 400) in size, which would be 
rare for them to be so precisely tuned to that, but also rare that they would 
be able to process 500 MB of data before windmill could deliver them more.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] dpcollins-google commented on a change in pull request #16901: Fix BoundedQueueExecutor and StreamingDataflowWorker to actually limit memory from windmill

Reply via email to