mcvsubbu opened a new pull request, #12045:
URL: https://github.com/apache/pinot/pull/12045

   The current handling of exceptions during creation of a consumer is 
incorrect, since the ExternalView for the segment remains in ERROR state while 
we specify the IdealState to be OFFLINE. This happens for one replica, while 
other replicas may consume fine and reach ONLINE state eventually. At that 
time, however, the particular segment that had problem consuming is not able to 
transition to ONLINE since a transition from ERROR to ONLINE is not suppored by 
Helix.
   
   A partition state of ERROR is a special state in Helix. Helix does not work 
the same way moving from ERROR to other states. Instead, Helix provides an 
admin API to reset the state of a partition from ERROR to its starting state 
(which in our case is OFFLINE). When this reset API is invoked, a state 
transition message of ERROR to StartingState is sent to the specific instance 
that hosts the partition in question. If the participant's currentstate is not 
ERROR, then this message is discarded automatically (and Pinot will never see 
it). Otherwise, it is passed on to Pinot and we have a transition from ERROR to 
OFFLINE.
   
   Tested by manually inserting an exception in the consumer creation code ad 
observing that externalview changes to ERROR, and then later onto OFFLINE.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to