Den mån 10 sep. 2018 kl 08:10 skrev Kevin Hrpcek <[email protected]
>:

> Update for the list archive.
>
> I went ahead and finished the mimic upgrade with the osds in a fluctuating
> state of up and down. The cluster did start to normalize a lot easier after
> everything was on mimic since the random mass OSD heartbeat failures
> stopped and the constant mon election problem went away. I'm still battling
> with the cluster reacting poorly to host reboots or small map changes, but
> I feel like my current pg:osd ratio may be playing a factor in that since
> we are 2x normal pg count while migrating data to new EC pools.
>

We found a setting to help us when we had constant reelections, though they
were lots more frequent, and not related in the least to Mimic, but bumping
the time between elections allowed our cluster to at least start. It voted,
decided on a master, the master started (re)playing transactions, got so
busy the others called for a new election, same mon won again, restarted
the job and repeated over that. Bumping the election to last 30s instead of
the default (5?) allowed the mon to finish looking over the things to do
and start replying to heartbeats as expected and then it went smoother from
there.

mon_lease = 30 for future reference.


-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to