Re: Stability Q

Mike Carey Tue, 01 May 2018 09:06:04 -0700

Should our default OOTB parameters be looser?


On 5/1/18 8:49 AM, Akshay Manchale Sridhar wrote:

I was running into the same issue while running replication experiments. A
quick fix is to increase the default value of HEARTBEAT_MAX_MISSES. There
are times in a loaded cluster when some nodes become unresponsive for a few
seconds and the CC marks them as dead because the defaults are too low.

On Tue, May 1, 2018 at 1:23 AM, Murtadha Hubail <[email protected]> wrote:

Indeed :-)

On 05/01/2018, 11:03 AM, "Mike Carey" <[email protected]> wrote:

     (And several sleep cycles and network changes were involved in my case
     between runs.  Typical enterprise use case, right? :-))


     On 5/1/18 12:31 AM, Murtadha Hubail wrote:
     > This is most likely caused by missing heartbeat from the NC to the
CC. Some macOS versions had issues with reestablishing connected sockets
after waking up from sleep.
     > But it could also be some unexpected exception that caused the NC to
shut down. If you could share the logs with me, I can tell you for sure.
     >
     > Cheers,
     > Murtadha
     >
     > On 05/01/2018, 9:06 AM, "Michael Carey" <[email protected]>
wrote:
     >
     >      Q:  Do we maybe have a stability regression in recent versions
(e.g.,
     >      the one leading to the UW snapshot)?  They have occasionally
seen things
     >      like this and I just did too.  (The system had been running for
awhile
     >      in the background on my Mac - e.g., for a day or so.)
     >
     >      Error: Cluster is in UNUSABLE state.
     >        One or more Node Controllers have left or haven't joined yet.
     >
     >
     >
     >

Re: Stability Q

Reply via email to