mcvsubbu opened a new pull request, #12045: URL: https://github.com/apache/pinot/pull/12045
The current handling of exceptions during creation of a consumer is incorrect, since the ExternalView for the segment remains in ERROR state while we specify the IdealState to be OFFLINE. This happens for one replica, while other replicas may consume fine and reach ONLINE state eventually. At that time, however, the particular segment that had problem consuming is not able to transition to ONLINE since a transition from ERROR to ONLINE is not suppored by Helix. A partition state of ERROR is a special state in Helix. Helix does not work the same way moving from ERROR to other states. Instead, Helix provides an admin API to reset the state of a partition from ERROR to its starting state (which in our case is OFFLINE). When this reset API is invoked, a state transition message of ERROR to StartingState is sent to the specific instance that hosts the partition in question. If the participant's currentstate is not ERROR, then this message is discarded automatically (and Pinot will never see it). Otherwise, it is passed on to Pinot and we have a transition from ERROR to OFFLINE. Tested by manually inserting an exception in the consumer creation code ad observing that externalview changes to ERROR, and then later onto OFFLINE. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
