I had posted about some of this a year ago in [1] and got some really helpful 
answers. Fortunately, I know a lot more now and feel a lot more comfortable 
with the scenario. Because I didn’t understand the architecture very well, I 
took a pause on distributing monitors and MDS over a WAN. I want to try that 
now.

With a hard limit on the production side of the WAN at two machines and a 
single monitor/MDS, it’s impossible to upgrade that machine without taking the 
network down. It only has a few hundred PGs, 8 OSDs and a mostly static CRUSH 
map. WAN latency is 4ms and there’s a 10Ge link between the production 
machines, so quorum will be maintained in all cases on the production side 
except during an upgrade.

Most importantly, all the OSDs will remain on the production side of the WAN 
link. 

It seems like the worst thing that could happen under normal state is the 
mon/MDS on the non-prod side of the WAN may be a few clocks behind the quorum 
on production. In an upgrade state, one of the two production machines is taken 
down and quorum exists across the WAN. Performance on the cluster might be 
slower as a result, but everything will remain stable with a stable link. Of 
course, after the upgrade, quorum is returned to the production side and the 
normal state returns.

This seems like a reasonable working model to me. Do others see holes in my 
logic?

Thanks! Brian

[1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032271.html
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to