Shanthoosh Venkataraman created SAMZA-1695:
----------------------------------------------
Summary: Clear events in debounce queue on session expiration
Key: SAMZA-1695
URL: https://issues.apache.org/jira/browse/SAMZA-1695
Project: Samza
Issue Type: Bug
Reporter: Shanthoosh Venkataraman
*Scenario:*
Let's assume there're three processors in the group [P1, P2, P3] and P1 is the
leader.
1. Leader processor(P1) loses connectivity with a zookeeper server in the
ensemble and it's ephemeral processor node is deleted(due to session
expiration).
2. Immediate successor(P2) to the leader(P1) finds out that the leader is dead
and declares itself as leader. Processor P2 Schedules onProcessorChange to
publish JobModel.
3. ZkClient connection retry logic helps the Leader to reconnect to another
zkServer in the ensemble and it joins as follower.
4. Processor P1 acts on the stale buffered event in the debounce queue(which it
received when it's a leader) and acts as leader. At this point, there're two
processors acting as leader(P1 & P2). If P1 proceeds to execute leader actions
before P2, P2 will fail(and in worst case can cause state corruption).
*Sample exception logs:*
https://gist.github.com/shanthoosh/55410fe4ebf3cfb65281b35f16397cad
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)