[GitHub] [beam] psobot commented on pull request #13475: Do not add unnecessary experiment use_multiple_sdk_containers.

GitBox Fri, 18 Dec 2020 13:02:08 -0800


psobot commented on pull request #13475:
URL: https://github.com/apache/beam/pull/13475#issuecomment-748317475



   Hi @tvalentyn and @chamikaramj!
   
   > Dataflow service may still recognize 
--experiment=no_use_multiple_sdk_containers for some time but it is NOT 
RECOMMENDED to use this knob: in the future Dataflow may have better algorithms 
for deciding how many SDK containers to start, and specifying this knob may 
interfere with these algorithms.
   >
   > Users can control the number of cores on the VMs by setting an appropriate 
--machine_type. Note that there are custom machine types, where users can 
select number of cores and number of memory GBs, such as 
--machine_type=custom-1-13312-ext which will have 1 core and 13GB memory.
   
   While this is true (that it is possible to control the number of cores with 
`machine-type`), there are many situations in which it's desirable for a job to 
use multiple cores while processing one element for performance reasons. (e.g.: 
running ML inference within Dataflow.) Is there a proposed alternative for 
workloads that benefit from multi-core parallelism without multiple SDK workers?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] psobot commented on pull request #13475: Do not add unnecessary experiment use_multiple_sdk_containers.

Reply via email to