J-HowHuang commented on code in PR #16956:
URL: https://github.com/apache/pinot/pull/16956#discussion_r2427012491


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/tenant/ZkBasedTenantRebalanceObserver.java:
##########
@@ -72,8 +73,17 @@ private ZkBasedTenantRebalanceObserver(String jobId, String 
tenantName, TenantRe
       TenantRebalanceContext tenantRebalanceContext, PinotHelixResourceManager 
pinotHelixResourceManager,
       int zkUpdateMaxRetries) {
     this(jobId, tenantName, pinotHelixResourceManager, zkUpdateMaxRetries);
-    _pinotHelixResourceManager.addControllerJobToZK(_jobId, 
makeJobMetadata(tenantRebalanceContext, progressStats),
-        ControllerJobTypes.TENANT_REBALANCE);
+    RetryPolicy retry = 
RetryPolicies.randomDelayRetryPolicy(_zkUpdateMaxRetries, 
MIN_ZK_UPDATE_RETRY_DELAY_MS,

Review Comment:
   > Does this mean we don't retry before this PR?
   I feel it is likely a race condition where one previous test can still touch 
this ZNode, while the new job is trying to update it
   Yes that was a miss, I feel like this is the culprit. But is that possible? 
Are they not run sequentially?



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/tenant/ZkBasedTenantRebalanceObserver.java:
##########
@@ -72,8 +73,17 @@ private ZkBasedTenantRebalanceObserver(String jobId, String 
tenantName, TenantRe
       TenantRebalanceContext tenantRebalanceContext, PinotHelixResourceManager 
pinotHelixResourceManager,
       int zkUpdateMaxRetries) {
     this(jobId, tenantName, pinotHelixResourceManager, zkUpdateMaxRetries);
-    _pinotHelixResourceManager.addControllerJobToZK(_jobId, 
makeJobMetadata(tenantRebalanceContext, progressStats),
-        ControllerJobTypes.TENANT_REBALANCE);
+    RetryPolicy retry = 
RetryPolicies.randomDelayRetryPolicy(_zkUpdateMaxRetries, 
MIN_ZK_UPDATE_RETRY_DELAY_MS,

Review Comment:
   > Does this mean we don't retry before this PR?
   I feel it is likely a race condition where one previous test can still touch 
this ZNode, while the new job is trying to update it
   
   Yes that was a miss, I feel like this is the culprit. But is that possible? 
Are they not run sequentially?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to