mcvsubbu commented on code in PR #12045:
URL: https://github.com/apache/pinot/pull/12045#discussion_r1406864843


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -821,6 +821,19 @@ public void segmentStoppedConsuming(LLCSegmentName 
llcSegmentName, String instan
       _controllerMetrics.addMeteredTableValue(realtimeTableName, 
ControllerMeter.LLC_ZOOKEEPER_UPDATE_FAILURES, 1L);
       throw e;
     }
+    // We know that we have successfully set the idealstate to be OFFLINE.
+    // We can now do a best effort to reset the externalview to be OFFLINE if 
it is in ERROR state.
+    // If the externalview is not in error state, then this reset will be 
ignored by the helix participant
+    // in the server when it receives the ERROR to OFFLINE state transition.
+    // Helix throws an exception if we try to reset state of a partition that 
is NOT in ERROR state in EV,
+    // So, if any exceptions are thrown, ignore it here.
+    try {
+      _helixAdmin.resetPartition(_helixManager.getClusterName(), instanceName,
+          
TableNameBuilder.REALTIME.tableNameWithType(llcSegmentName.getTableName()),
+          Collections.singletonList(segmentName));
+    } catch (Exception e) {
+      // Ignore

Review Comment:
   So, as I described in the comments, this call from the server may come in 
one of two situations:
   - The server could not start consumption at all (could not construct the 
consumer), so the state transition from OFFLINE to CONSUMING failed. The 
externalview will be set to ERROR by Helix. 
   - The server succeeded the OFFLINE to CONSUMING state transition, but could 
not consume rows from the stream at some point after the state transition. 
There could be zero or any number of rows already consumed. In this case, the 
externalview is in CONSUMING state.
   
   If we call the reset API in the second case, helix throws an exception. If 
we call in the first case, helix simply  does its best to reset the state and 
send an ERROR to OFFLINE state transition to the server.
   
   Logging an error in this case will make the operator think that we did not 
reguister the offline call from server correctly. I am therefore not in favor 
of logging an error. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to