+1 Best, Taewoo
On Tue, May 1, 2018 at 8:49 AM, Akshay Manchale Sridhar <[email protected]> wrote: > I was running into the same issue while running replication experiments. A > quick fix is to increase the default value of HEARTBEAT_MAX_MISSES. There > are times in a loaded cluster when some nodes become unresponsive for a few > seconds and the CC marks them as dead because the defaults are too low. > > On Tue, May 1, 2018 at 1:23 AM, Murtadha Hubail <[email protected]> > wrote: > > > Indeed :-) > > > > On 05/01/2018, 11:03 AM, "Mike Carey" <[email protected]> wrote: > > > > (And several sleep cycles and network changes were involved in my > case > > between runs. Typical enterprise use case, right? :-)) > > > > > > On 5/1/18 12:31 AM, Murtadha Hubail wrote: > > > This is most likely caused by missing heartbeat from the NC to the > > CC. Some macOS versions had issues with reestablishing connected sockets > > after waking up from sleep. > > > But it could also be some unexpected exception that caused the NC > to > > shut down. If you could share the logs with me, I can tell you for sure. > > > > > > Cheers, > > > Murtadha > > > > > > On 05/01/2018, 9:06 AM, "Michael Carey" <[email protected]> > > wrote: > > > > > > Q: Do we maybe have a stability regression in recent versions > > (e.g., > > > the one leading to the UW snapshot)? They have occasionally > > seen things > > > like this and I just did too. (The system had been running > for > > awhile > > > in the background on my Mac - e.g., for a day or so.) > > > > > > Error: Cluster is in UNUSABLE state. > > > One or more Node Controllers have left or haven't joined > yet. > > > > > > > > > > > > > > > > > > > > > > >
