[
https://issues.apache.org/jira/browse/SAMZA-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shanthoosh Venkataraman resolved SAMZA-1695.
--------------------------------------------
Resolution: Fixed
Fix Version/s: 0.15.0
> Clear events in debounce queue on session expiration
> ----------------------------------------------------
>
> Key: SAMZA-1695
> URL: https://issues.apache.org/jira/browse/SAMZA-1695
> Project: Samza
> Issue Type: Bug
> Reporter: Shanthoosh Venkataraman
> Assignee: Shanthoosh Venkataraman
> Priority: Major
> Fix For: 0.15.0
>
>
> *Scenario:*
> Let's assume there're three processors in the group [P1, P2, P3] and P1 is
> the leader.
> 1. Leader processor(P1) loses connectivity with a zookeeper server in the
> ensemble and it's ephemeral processor node is deleted(due to session
> expiration).
> 2. Immediate successor(P2) to the leader(P1) finds out that the leader is
> dead and declares itself as leader. Processor P2 Schedules onProcessorChange
> to publish JobModel.
> 3. ZkClient connection retry logic helps the Leader(P1) to reconnect to
> another zkServer in the ensemble and it joins as follower.
> 4. Processor P1 acts on the stale buffered event in the debounce queue(which
> it received when it's a leader) and acts as leader. At this point, there're
> two processors acting as leader(P1 & P2). If P1 proceeds to execute leader
> actions before P2, P2 will fail(and in worst case can cause state corruption).
> *Sample exception logs:*
> [https://gist.github.com/shanthoosh/55410fe4ebf3cfb65281b35f16397cad]
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)