Should our default OOTB parameters be looser?
On 5/1/18 8:49 AM, Akshay Manchale Sridhar wrote:
I was running into the same issue while running replication experiments. A
quick fix is to increase the default value of HEARTBEAT_MAX_MISSES. There
are times in a loaded cluster when some nodes become unresponsive for a few
seconds and the CC marks them as dead because the defaults are too low.
On Tue, May 1, 2018 at 1:23 AM, Murtadha Hubail <[email protected]> wrote:
Indeed :-)
On 05/01/2018, 11:03 AM, "Mike Carey" <[email protected]> wrote:
(And several sleep cycles and network changes were involved in my case
between runs. Typical enterprise use case, right? :-))
On 5/1/18 12:31 AM, Murtadha Hubail wrote:
> This is most likely caused by missing heartbeat from the NC to the
CC. Some macOS versions had issues with reestablishing connected sockets
after waking up from sleep.
> But it could also be some unexpected exception that caused the NC to
shut down. If you could share the logs with me, I can tell you for sure.
>
> Cheers,
> Murtadha
>
> On 05/01/2018, 9:06 AM, "Michael Carey" <[email protected]>
wrote:
>
> Q: Do we maybe have a stability regression in recent versions
(e.g.,
> the one leading to the UW snapshot)? They have occasionally
seen things
> like this and I just did too. (The system had been running for
awhile
> in the background on my Mac - e.g., for a day or so.)
>
> Error: Cluster is in UNUSABLE state.
> One or more Node Controllers have left or haven't joined yet.
>
>
>
>