Hi Greg, Would you mind creating a JIRA for this with the thread dump ( i don't see it attached to your message).
Thanks, Damian On Fri, 7 Jul 2017 at 10:36 Greg Fodor <gfo...@gmail.com> wrote: > I'm running a 10.2 job across 5 nodes with 32 stream threads on each node > and find that when gracefully shutdown all of them at once via an ansible > scripts, some of the nodes end up freezing -- at a glance the attached > thread dump implies a deadlock between stream threads trying to update > their state via setState. We haven't had this problem before but it may or > may not be related to changes in 10.2 (we are upgrading from 10.0 to 10.2) > > when we gracefully shutdown all nodes simultaneously, what typically > happens is some subset of the nodes end up not shutting down completely but > end up going through a rebalance first. it seems this deadlock requires > this rebalancing to occur simultaneously with the graceful shutdown. if we > happen to shut them down and no rebalance happens, i don't believe this > deadlock is triggered. > > the deadlock appears related to the state change handlers being subscribed > across threads and the fact that both StreamThread#setState and > StreamStateListener#onChange are both synchronized methods. >