Re: Stability Q

Akshay Manchale Sridhar Tue, 01 May 2018 08:50:16 -0700

I was running into the same issue while running replication experiments. A
quick fix is to increase the default value of HEARTBEAT_MAX_MISSES. There
are times in a loaded cluster when some nodes become unresponsive for a few
seconds and the CC marks them as dead because the defaults are too low.


On Tue, May 1, 2018 at 1:23 AM, Murtadha Hubail <[email protected]> wrote:

> Indeed :-)
>
> On 05/01/2018, 11:03 AM, "Mike Carey" <[email protected]> wrote:
>
>     (And several sleep cycles and network changes were involved in my case
>     between runs.  Typical enterprise use case, right? :-))
>
>
>     On 5/1/18 12:31 AM, Murtadha Hubail wrote:
>     > This is most likely caused by missing heartbeat from the NC to the
> CC. Some macOS versions had issues with reestablishing connected sockets
> after waking up from sleep.
>     > But it could also be some unexpected exception that caused the NC to
> shut down. If you could share the logs with me, I can tell you for sure.
>     >
>     > Cheers,
>     > Murtadha
>     >
>     > On 05/01/2018, 9:06 AM, "Michael Carey" <[email protected]>
> wrote:
>     >
>     >      Q:  Do we maybe have a stability regression in recent versions
> (e.g.,
>     >      the one leading to the UW snapshot)?  They have occasionally
> seen things
>     >      like this and I just did too.  (The system had been running for
> awhile
>     >      in the background on my Mac - e.g., for a day or so.)
>     >
>     >      Error: Cluster is in UNUSABLE state.
>     >        One or more Node Controllers have left or haven't joined yet.
>     >
>     >
>     >
>     >
>
>
>
>
>

Re: Stability Q

Reply via email to