The issue, Sage, is that we have to deal with the cluster being re-expanded. If we start with 5 monitors and scale back to 3, running the "ceph mon remove N" command after stopping each monitor and don't restart the existing monitors, we cannot re-add those same monitors that were previously removed. They will suicide at startup.
On Mon, Jun 24, 2013 at 4:22 PM, Sage Weil <[email protected]> wrote: > On Mon, 24 Jun 2013, Mandell Degerness wrote: >> Hmm. This is a bit ugly from our perspective, but not fatal to your >> design (just our implementation). At the time we run the rm, the >> cluster is smaller and so the restart of each monitor is not fatal to >> the cluster. The problem is on our side in terms of guaranteeing >> order of behaviors. > > Sorry, I'm still confused about where the monitor gets restarted. It > doesn't matter if the removed monitor is stopped or failed/gone; 'ceph mon > rm ...' will remove it from the monmap and quorum. It sounds like you're > suggesting that the surviving monitors need to be restarted, but they do > not, as long as enough of them are alive to form a quorum and pass the > decree that the mon cluster is smaller. So 5 -> 2 would be problematic, > but 5 -> 3 (assuming there are 3 currently up) will work without > restarts... > > sage > > >> >> On Mon, Jun 24, 2013 at 1:54 PM, Sage Weil <[email protected]> wrote: >> > On Mon, 24 Jun 2013, Mandell Degerness wrote: >> >> I'm testing the change (actually re-starting the monitors after the >> >> monitor removal), but this brings up the issue with why we didn't want >> >> to do this in the first place: When reducing the number of monitors >> >> from 5 to 3, we are guaranteed to have a service outage for the time >> >> it takes to restart at least one of the monitors (and, possibly, for >> >> two of the restarts, now that I think on it). In theory, the >> >> stop/start cycle is very short and should complete in a reasonable >> >> time. What I'm concerned about, however, is the case that something >> >> is wrong with our re-written config file. In that case, the outage is >> >> immediate and will last until the problem is corrected on the first >> >> server to have the monitor restarted. >> > >> > I'm jumping into this thread late, but: why would you follow the second >> > removal procedure for broken clusters? To go from 5->3 mons, you should >> > just stop 2 of the mons and do 'ceph mon rm <addr1>' 'ceph mon rm >> > <addr2>'. >> > >> > sage >> > >> >> >> >> On Mon, Jun 24, 2013 at 10:07 AM, John Nielsen <[email protected]> wrote: >> >> > On Jun 21, 2013, at 5:00 PM, Mandell Degerness >> >> > <[email protected]> wrote: >> >> > >> >> >> There is a scenario where we would want to remove a monitor and, at a >> >> >> later date, re-add the monitor (using the same IP address). Is there >> >> >> a supported way to do this? I tried deleting the monitor directory >> >> >> and rebuilding from scratch following the add monitor procedures from >> >> >> the web, but the monitor still suicide's when started. >> >> > >> >> > >> >> > I assume you're already referencing this: >> >> > http://ceph.com/docs/master/rados/operations/add-or-rm-mons/ >> >> > >> >> > I have done what you describe before. There were a couple hiccups, >> >> > let's see if I remember the specifics: >> >> > >> >> > Remove: >> >> > Follow the first two steps under "removing a monitor (manual) at the >> >> > link above: >> >> > service ceph stop mon.N >> >> > ceph mon remove N >> >> > Comment out the monitor entry in ceph.conf on ALL mon, osd and client >> >> > hosts. >> >> > Restart services as required to make everyone happy with the smaller >> >> > set of monitors >> >> > >> >> > Re-add: >> >> > Wipe the old monitor's directory and re-create it >> >> > Follow the steps for "adding a monitor (manual) at the link above. >> >> > Instead of adding a new entry you can just un-comment your old ones in >> >> > ceph.conf. You can also start the monitor with "service ceph start mon >> >> > N" on the appropriate host instead of running yourself (step 8). Note >> >> > that you DO need to run ceph-mon as specified in step 5. I was >> >> > initially confused about the '--mkfs' flag there--it doesn't refer to >> >> > the OS's filesystem, you should use a directory or mountpoint that is >> >> > already prepared/mounted. >> >> > >> >> > HTH. If you run into trouble post exactly the steps you followed and >> >> > additional details about your setup. >> >> > >> >> > JN >> >> > >> >> _______________________________________________ >> >> ceph-users mailing list >> >> [email protected] >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> >> _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
