RahulPrakash96 commented on code in PR #3113:
URL: https://github.com/apache/polaris/pull/3113#discussion_r2565225179


##########
polaris-core/src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java:
##########
@@ -312,16 +318,103 @@ private void validateAccountAndContainer(
         });
   }
 
-  private AccessToken getAccessToken(String tenantId) {
+  /**
+   * Fetches an Azure AD access token with timeout and retry logic to handle 
transient failures.
+   *
+   * <p>This access token is used internally to obtain a user delegation key 
from Azure Storage,
+   * which is then used to generate SAS tokens for client credential vending.
+   *
+   * <p>This method implements a defensive strategy against slow or failing 
cloud provider requests:
+   *
+   * <ul>
+   *   <li>Per-attempt timeout (configurable via CLOUD_API_TIMEOUT_MILLIS, 
default 15000ms)
+   *   <li>Exponential backoff retry (configurable count and initial delay via 
CLOUD_API_RETRY_COUNT
+   *       and CLOUD_API_RETRY_DELAY_MILLIS, defaults: 3 attempts starting at 
2000ms)
+   *   <li>Jitter to prevent thundering herd (configurable via 
CLOUD_API_RETRY_JITTER_MILLIS, default 500ms)
+   * </ul>
+   *
+   * @param realmConfig the realm configuration to get timeout and retry 
settings
+   * @param tenantId the Azure tenant ID
+   * @return the access token
+   * @throws RuntimeException if token fetch fails after all retries or times 
out
+   */
+  private AccessToken getAccessToken(RealmConfig realmConfig, String tenantId) 
{
+    int timeoutMillis = realmConfig.getConfig(CLOUD_API_TIMEOUT_MILLIS);
+    int retryCount = realmConfig.getConfig(CLOUD_API_RETRY_COUNT);
+    int initialDelayMillis = 
realmConfig.getConfig(CLOUD_API_RETRY_DELAY_MILLIS);
+    int jitterMillis = realmConfig.getConfig(CLOUD_API_RETRY_JITTER_MILLIS);
+    double jitter = jitterMillis / 1000.0; // Convert millis to fraction for 
jitter factor

Review Comment:
   Ah, I went with millis initially to keep it consistent with the other 
timeout/delay configs (which are all in milliseconds), thinking it'd be more 
straightforward to work with absolute values.
   
   But you're right - that doesn't really make sense here since the jitter 
factor gets applied to the computed delay, which changes with each retry (2s → 
4s → 8s). So 750 would mean something different on each attempt, which is 
pretty confusing.
   
   Changed it to CLOUD_API_RETRY_JITTER_FACTOR using the 0.0-1.0 range



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to