satybald commented on code in PR #28820:
URL: https://github.com/apache/beam/pull/28820#discussion_r1347688387


##########
sdks/python/apache_beam/utils/retry.py:
##########
@@ -42,9 +43,11 @@
 try:
   from apitools.base.py.exceptions import HttpError
   from google.api_core.exceptions import GoogleAPICallError
+  from google.api_core import exceptions

Review Comment:
   to make code more readable, would it be better to use import alias to 
clearly distinguish between `import requests.exceptions`



##########
sdks/python/apache_beam/utils/retry.py:
##########
@@ -168,15 +183,29 @@ def retry_on_server_errors_and_timeout_filter(exception):
 def retry_on_server_errors_timeout_or_quota_issues_filter(exception):
   """Retry on server, timeout and 403 errors.
 
-  403 errors can be accessDenied, billingNotEnabled, and also quotaExceeded,
-  rateLimitExceeded."""
+  403 errors include both transient (accessDenied, billingNotEnabled) and
+  non-transient errors (quotaExceeded, rateLimitExceeded). Only retry transient
+  errors."""
   if HttpError is not None and isinstance(exception, HttpError):
     if exception.status_code == 403:
+      try:
+        # attempt to extract the reason and check if it's retryable
+        return exception.content["error"]["errors"][0][
+            "reason"] in _RETRYABLE_REASONS
+      except Exception:
+        _LOGGER.debug(
+            "Could not determine if HttpError is non-transient. Will retry: 
%s",
+            exception)
       return True

Review Comment:
   we shouldn't blindly retry on all `403` exceptions. Forbidden exceptions 
belongs to permanent exceptions. 
   
   Here's a guide from GCP about healthy retry policies:
   
   https://cloud.google.com/storage/docs/retry-strategy
   
    



##########
sdks/python/apache_beam/utils/retry.py:
##########
@@ -57,6 +60,18 @@
 # pylint: enable=wrong-import-order, wrong-import-position
 
 _LOGGER = logging.getLogger(__name__)
+_RETRYABLE_REASONS = [
+    "rateLimitExceeded", "quotaExceeded", "internalError", "backendError"

Review Comment:
   why do we need to retry on `quotaExceeded`? Once the quota exceeded it needs 
a manual intervention or back-off of substantial amount of time. 



##########
sdks/python/apache_beam/utils/retry.py:
##########
@@ -168,15 +183,29 @@ def retry_on_server_errors_and_timeout_filter(exception):
 def retry_on_server_errors_timeout_or_quota_issues_filter(exception):
   """Retry on server, timeout and 403 errors.
 
-  403 errors can be accessDenied, billingNotEnabled, and also quotaExceeded,
-  rateLimitExceeded."""
+  403 errors include both transient (accessDenied, billingNotEnabled) and
+  non-transient errors (quotaExceeded, rateLimitExceeded). Only retry transient
+  errors."""
   if HttpError is not None and isinstance(exception, HttpError):
     if exception.status_code == 403:
+      try:
+        # attempt to extract the reason and check if it's retryable
+        return exception.content["error"]["errors"][0][
+            "reason"] in _RETRYABLE_REASONS
+      except Exception:

Review Comment:
   Instead of generic `Exception` shoud it be `KeyError`? 



##########
sdks/python/apache_beam/utils/retry.py:
##########
@@ -57,6 +60,18 @@
 # pylint: enable=wrong-import-order, wrong-import-position
 
 _LOGGER = logging.getLogger(__name__)
+_RETRYABLE_REASONS = [
+    "rateLimitExceeded", "quotaExceeded", "internalError", "backendError"
+]
+_RETRYABLE_TYPES = (

Review Comment:
   Should we be DRY here and avoid `if exceptions`? i.e.
   ```
   _RETRYABLE_EXCEPTIONS_FROM_MODULE = [
       exceptions.TooManyRequests,
       exceptions.InternalServerError,
       exceptions.BadGateway,
       exceptions.ServiceUnavailable,
       exceptions.DeadlineExceeded
   ] if exceptions else []
   
   _RETRYABLE_TYPES = tuple([
       *_RETRYABLE_EXCEPTIONS_FROM_MODULE,
       requests.exceptions.ConnectionError,
       requests.exceptions.Timeout,
       ConnectionError
   ])
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to