mcvsubbu commented on code in PR #12045:
URL: https://github.com/apache/pinot/pull/12045#discussion_r1406907002


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -821,6 +821,19 @@ public void segmentStoppedConsuming(LLCSegmentName 
llcSegmentName, String instan
       _controllerMetrics.addMeteredTableValue(realtimeTableName, 
ControllerMeter.LLC_ZOOKEEPER_UPDATE_FAILURES, 1L);
       throw e;
     }
+    // We know that we have successfully set the idealstate to be OFFLINE.
+    // We can now do a best effort to reset the externalview to be OFFLINE if 
it is in ERROR state.
+    // If the externalview is not in error state, then this reset will be 
ignored by the helix participant
+    // in the server when it receives the ERROR to OFFLINE state transition.
+    // Helix throws an exception if we try to reset state of a partition that 
is NOT in ERROR state in EV,
+    // So, if any exceptions are thrown, ignore it here.
+    try {
+      _helixAdmin.resetPartition(_helixManager.getClusterName(), instanceName,
+          
TableNameBuilder.REALTIME.tableNameWithType(llcSegmentName.getTableName()),
+          Collections.singletonList(segmentName));
+    } catch (Exception e) {
+      // Ignore

Review Comment:
   @sajjad-moradi  and I discussed this, and we think the ideal solution is to:
   - Create specific reason codes (not strings) that we can exchange between 
the controller and server when the server posts any event to the controller 
(not just to indicate stop consumption)
   - Use the reason code to decide whether the reset API should be invoked 
(e.g. a reason code that indicates that the consuming segment was never 
created, will end up invoking the helix reset API). I will file an issue for 
this as a cleanup task.
   
   Another alternative can be to look at the externalview in this same method 
before invoking the API. I had considered that alternative, and discarded it 
since it involves another Helix API call that may come in with its own errors 
and exceptions to be handled. It seemed to me to better to just call the reset 
API once and for all. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to