dpcollins-google commented on a change in pull request #16901:
URL: https://github.com/apache/beam/pull/16901#discussion_r812324231
##########
File path:
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
##########
@@ -195,6 +195,8 @@
// retrieving extra work from Windmill without working on it, leading to
better
// prioritization / utilization.
static final int MAX_WORK_UNITS_QUEUED = 100;
+ // Maximum bytes of WorkItems being processed in the work queue at a time.
+ static final int MAX_WORK_UNITS_BYTES = 500 << 20; // 500MB
Review comment:
This would only throttle pipelines that are successfully operating with
>500 MB outstanding at a time and recovering from this scenario faster than
windmill could deliver new data to the worker. I'd expect this to be a very,
very small percentage of pipelines, effectively only those whose work items
were consistently about (NON_OOM_USABLE_MEMORY / 400) in size, which would be
rare for them to be so precisely tuned to that, but also rare that they would
be able to process 500 MB of data before windmill could deliver them more.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]