Hi Greg,

Would you mind creating a JIRA for this with the thread dump ( i don't see
it attached to your message).

Thanks,
Damian

On Fri, 7 Jul 2017 at 10:36 Greg Fodor <gfo...@gmail.com> wrote:

> I'm running a 10.2 job across 5 nodes with 32 stream threads on each node
> and find that when gracefully shutdown all of them at once via an ansible
> scripts, some of the nodes end up freezing -- at a glance the attached
> thread dump implies a deadlock between stream threads trying to update
> their state via setState. We haven't had this problem before but it may or
> may not be related to changes in 10.2 (we are upgrading from 10.0 to 10.2)
>
> when we gracefully shutdown all nodes simultaneously, what typically
> happens is some subset of the nodes end up not shutting down completely but
> end up going through a rebalance first. it seems this deadlock requires
> this rebalancing to occur simultaneously with the graceful shutdown. if we
> happen to shut them down and no rebalance happens, i don't believe this
> deadlock is triggered.
>
> the deadlock appears related to the state change handlers being subscribed
> across threads and the fact that both StreamThread#setState and
> StreamStateListener#onChange are both synchronized methods.
>

Reply via email to