[GitHub] [beam] dpcollins-google commented on a change in pull request #16901: Fix BoundedQueueExecutor and StreamingDataflowWorker to actually limit memory from windmill

GitBox Wed, 23 Feb 2022 09:40:30 -0800


dpcollins-google commented on a change in pull request #16901:
URL: https://github.com/apache/beam/pull/16901#discussion_r813149643




##########
File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
##########
@@ -195,6 +195,8 @@
   // retrieving extra work from Windmill without working on it, leading to 
better
   // prioritization / utilization.
   static final int MAX_WORK_UNITS_QUEUED = 100;
+  // Maximum bytes of WorkItems being processed in the work queue at a time.
+  static final int MAX_WORK_UNITS_BYTES = 500 << 20; // 500MB

Review comment:
       What would your preferred default limit be? The default memory limit for 
dataflow streaming engine workers is 8G (n1-standard-2), would you prefer a 2G 
or 4G default for such machines?
   
   If you'd prefer this to be scaled on available jvm memory, what would the 
preferred default fraction be?
   
   > For elements we have separate limits for the queue and for active (since 
there are a limited # of active threads each processing one work item).
   
   We don't really. We have a number of threads and a number of queue slots, so 
threads + queue slots = outstanding. With the default of 100 work threads, if 
we didnt' limit active memory, we would still ahve this problem.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] dpcollins-google commented on a change in pull request #16901: Fix BoundedQueueExecutor and StreamingDataflowWorker to actually limit memory from windmill

Reply via email to