Re: Stability Q

Taewoo Kim Tue, 01 May 2018 09:05:43 -0700

+1

Best,
Taewoo


On Tue, May 1, 2018 at 8:49 AM, Akshay Manchale Sridhar <[email protected]>
wrote:

> I was running into the same issue while running replication experiments. A
> quick fix is to increase the default value of HEARTBEAT_MAX_MISSES. There
> are times in a loaded cluster when some nodes become unresponsive for a few
> seconds and the CC marks them as dead because the defaults are too low.
>
> On Tue, May 1, 2018 at 1:23 AM, Murtadha Hubail <[email protected]>
> wrote:
>
> > Indeed :-)
> >
> > On 05/01/2018, 11:03 AM, "Mike Carey" <[email protected]> wrote:
> >
> >     (And several sleep cycles and network changes were involved in my
> case
> >     between runs.  Typical enterprise use case, right? :-))
> >
> >
> >     On 5/1/18 12:31 AM, Murtadha Hubail wrote:
> >     > This is most likely caused by missing heartbeat from the NC to the
> > CC. Some macOS versions had issues with reestablishing connected sockets
> > after waking up from sleep.
> >     > But it could also be some unexpected exception that caused the NC
> to
> > shut down. If you could share the logs with me, I can tell you for sure.
> >     >
> >     > Cheers,
> >     > Murtadha
> >     >
> >     > On 05/01/2018, 9:06 AM, "Michael Carey" <[email protected]>
> > wrote:
> >     >
> >     >      Q:  Do we maybe have a stability regression in recent versions
> > (e.g.,
> >     >      the one leading to the UW snapshot)?  They have occasionally
> > seen things
> >     >      like this and I just did too.  (The system had been running
> for
> > awhile
> >     >      in the background on my Mac - e.g., for a day or so.)
> >     >
> >     >      Error: Cluster is in UNUSABLE state.
> >     >        One or more Node Controllers have left or haven't joined
> yet.
> >     >
> >     >
> >     >
> >     >
> >
> >
> >
> >
> >
>

Re: Stability Q

Reply via email to