[GitHub] [beam] chamikaramj commented on a change in pull request #14723: [BEAM-12272] Python - Backport Firestore connector's ramp-up throttling to Datastore connector

GitBox Thu, 20 May 2021 16:46:10 -0700


chamikaramj commented on a change in pull request #14723:
URL: https://github.com/apache/beam/pull/14723#discussion_r636544377




##########
File path: sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py
##########
@@ -276,15 +277,33 @@ class _Mutate(PTransform):
   Only idempotent Datastore mutation operations (upsert and delete) are
   supported, as the commits are retried when failures occur.
   """
-  def __init__(self, mutate_fn):
+
+  # Default hint for the expected number of workers in the ramp-up throttling
+  # step for write or delete operations.
+  _DEFAULT_HINT_NUM_WORKERS = 500

Review comment:
       Do you know how this behaves when Dataflow Autoscaling is enabled ? 
Users can set two parameters num_workers and max_num_workers. We start with 
num_workers and Dataflow job may scale up to "max_num_workers".
   
   Also, the sink reports the throttled time to Dataflow service which adjusts 
autoscaling decisions accordingly (i.e. do not scale up if sink is 
significantly throttled).
   
   I'm wondering if statically setting the expected number of workers 
statically could result in a significant regression for existing users. 
   
   cc: @nehsyc from autoscaling team.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] chamikaramj commented on a change in pull request #14723: [BEAM-12272] Python - Backport Firestore connector's ramp-up throttling to Datastore connector

Reply via email to