mcvsubbu commented on code in PR #12045:
URL: https://github.com/apache/pinot/pull/12045#discussion_r1406864843
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java:
##########
@@ -821,6 +821,19 @@ public void segmentStoppedConsuming(LLCSegmentName
llcSegmentName, String instan
_controllerMetrics.addMeteredTableValue(realtimeTableName,
ControllerMeter.LLC_ZOOKEEPER_UPDATE_FAILURES, 1L);
throw e;
}
+ // We know that we have successfully set the idealstate to be OFFLINE.
+ // We can now do a best effort to reset the externalview to be OFFLINE if
it is in ERROR state.
+ // If the externalview is not in error state, then this reset will be
ignored by the helix participant
+ // in the server when it receives the ERROR to OFFLINE state transition.
+ // Helix throws an exception if we try to reset state of a partition that
is NOT in ERROR state in EV,
+ // So, if any exceptions are thrown, ignore it here.
+ try {
+ _helixAdmin.resetPartition(_helixManager.getClusterName(), instanceName,
+
TableNameBuilder.REALTIME.tableNameWithType(llcSegmentName.getTableName()),
+ Collections.singletonList(segmentName));
+ } catch (Exception e) {
+ // Ignore
Review Comment:
So, as I described in the comments, this call from the server may come in
one of two situations:
- The server could not start consumption at all (could not construct the
consumer), so the state transition from OFFLINE to CONSUMING failed. The
externalview will be set to ERROR by Helix.
- The server succeeded the OFFLINE to CONSUMING state transition, but could
not consume rows from the stream at some point after the state transition.
There could be zero or any number of rows already consumed. In this case, the
externalview is in CONSUMING state.
If we call the reset API in the second case, helix throws an exception. If
we call in the first case, helix simply does its best to reset the state and
send an ERROR to OFFLINE state transition to the server.
Logging an error in this case will make the operator think that we did not
reguister the offline call from server correctly. I am therefore not in favor
of logging an error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]