[ceph-users] Re: cephfs ha mount expectations
Hi, Thanks for the interesting discussion. Actually it's a bit disappointing to see that also cephfs with multiple MDS servers is not as HA as we would like it. it really depends on what you're trying to achieve since there are lots of different scenarios how to setup and configure one or more ceph filesystems. And without testing your desired scenario you can't really say that it's disappointing. ;-) I read also that filover time depends on the number of clients. We will only have three, and they will not do heavy IO. So that should perhaps help a bit. In that case it's more likely that your clients won't notice a failover, but again, test it. Is there any difference between an 'uncontrolled' ceph server (accidental) reboot, and a controlled reboot, where we (for example) first failover the MDS in a controlled, gentle way? I haven't noticed a difference, but I'm still working with older clusters (mostly Nautilus), maybe in newer versions the failover is smoother, I can't tell yet. Regards, Eugen Zitat von mj : Hi all, Thanks for the interesting discussion. Actually it's a bit disappointing to see that also cephfs with multiple MDS servers is not as HA as we would like it. I read also that filover time depends on the number of clients. We will only have three, and they will not do heavy IO. So that should perhaps help a bit. Is there any difference between an 'uncontrolled' ceph server (accidental) reboot, and a controlled reboot, where we (for example) first failover the MDS in a controlled, gentle way? MJ Op 26-10-2022 om 14:40 schreef Eugen Block: Just one comment on the standby-replay setting: it really depends on the use-case, it can make things worse during failover. Just recently we had a customer where disabling standby-replay made failovers even faster and cleaner in a heavily used cluster. With standby-replay they had to manually clean things up in the mounted directory. So I would recommend to test both options. Zitat von William Edwards : Op 26 okt. 2022 om 10:11 heeft mj het volgende geschreven: Hi! We have read https://docs.ceph.com/en/latest/man/8/mount.ceph, and would like to see our expectations confirmed (or denied) here. :-) Suppose we build a three-node cluster, three monitors, three MDSs, etc, in order to export a cephfs to multiple client nodes. On the (RHEL8) clients (web application servers) fstab, we will mount the cephfs like: cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2 We expect that the RHEL clients will then be able to use (read/write) a shared /mnt/ha-pool directory simultaneously. Our question: how HA can we expect this setup to be? Looking for some practical experience here. Specific: Can we reboot any of the three involved ceph servers without the clients noticing anything? Or will there be certain timeouts involved, during which /mnt/ha-pool/ will appear unresposive, and *after* a timeout the client switches monitor node, and /mnt/ha-pool/ will respond again? Monitor failovers don’t cause a noticeable disruption IIRC. MDS failovers do. The MDS needs to replay. You can minimise the effect with mds_standby_replay. Of course we hope the answer is: in such a setup, cephfs clients should not notice a reboot at all. :-) All the best! MJ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs ha mount expectations
Hi all, Thanks for the interesting discussion. Actually it's a bit disappointing to see that also cephfs with multiple MDS servers is not as HA as we would like it. I read also that filover time depends on the number of clients. We will only have three, and they will not do heavy IO. So that should perhaps help a bit. Is there any difference between an 'uncontrolled' ceph server (accidental) reboot, and a controlled reboot, where we (for example) first failover the MDS in a controlled, gentle way? MJ Op 26-10-2022 om 14:40 schreef Eugen Block: Just one comment on the standby-replay setting: it really depends on the use-case, it can make things worse during failover. Just recently we had a customer where disabling standby-replay made failovers even faster and cleaner in a heavily used cluster. With standby-replay they had to manually clean things up in the mounted directory. So I would recommend to test both options. Zitat von William Edwards : Op 26 okt. 2022 om 10:11 heeft mj het volgende geschreven: Hi! We have read https://docs.ceph.com/en/latest/man/8/mount.ceph, and would like to see our expectations confirmed (or denied) here. :-) Suppose we build a three-node cluster, three monitors, three MDSs, etc, in order to export a cephfs to multiple client nodes. On the (RHEL8) clients (web application servers) fstab, we will mount the cephfs like: cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2 We expect that the RHEL clients will then be able to use (read/write) a shared /mnt/ha-pool directory simultaneously. Our question: how HA can we expect this setup to be? Looking for some practical experience here. Specific: Can we reboot any of the three involved ceph servers without the clients noticing anything? Or will there be certain timeouts involved, during which /mnt/ha-pool/ will appear unresposive, and *after* a timeout the client switches monitor node, and /mnt/ha-pool/ will respond again? Monitor failovers don’t cause a noticeable disruption IIRC. MDS failovers do. The MDS needs to replay. You can minimise the effect with mds_standby_replay. Of course we hope the answer is: in such a setup, cephfs clients should not notice a reboot at all. :-) All the best! MJ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs ha mount expectations
Just one comment on the standby-replay setting: it really depends on the use-case, it can make things worse during failover. Just recently we had a customer where disabling standby-replay made failovers even faster and cleaner in a heavily used cluster. With standby-replay they had to manually clean things up in the mounted directory. So I would recommend to test both options. Zitat von William Edwards : Op 26 okt. 2022 om 10:11 heeft mj het volgende geschreven: Hi! We have read https://docs.ceph.com/en/latest/man/8/mount.ceph, and would like to see our expectations confirmed (or denied) here. :-) Suppose we build a three-node cluster, three monitors, three MDSs, etc, in order to export a cephfs to multiple client nodes. On the (RHEL8) clients (web application servers) fstab, we will mount the cephfs like: cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2 We expect that the RHEL clients will then be able to use (read/write) a shared /mnt/ha-pool directory simultaneously. Our question: how HA can we expect this setup to be? Looking for some practical experience here. Specific: Can we reboot any of the three involved ceph servers without the clients noticing anything? Or will there be certain timeouts involved, during which /mnt/ha-pool/ will appear unresposive, and *after* a timeout the client switches monitor node, and /mnt/ha-pool/ will respond again? Monitor failovers don’t cause a noticeable disruption IIRC. MDS failovers do. The MDS needs to replay. You can minimise the effect with mds_standby_replay. Of course we hope the answer is: in such a setup, cephfs clients should not notice a reboot at all. :-) All the best! MJ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs ha mount expectations
I use this very example, few more servers. I have no outage windows for my ceph deployments as they support several production environments. MDS is your focus, there are many knobs, but MDS is the key to client experience. In my environment, MDS failover takes 30-180 seconds, depending on how much replay and rejoin needs to take place. During this failover I/O on the client is paused, but not broken. If you were to do an ls at the time of failover, it may not return for a couple min worst case. If a file transfer is ongoing it will stop writing for this failover time, but both will complete after failover. If I have MDs issues and failover for whatever reason is > 5 min, my clients are lost. I must reboot all clients tied to that MDS to recover due to thousands of open files in various states. This is obviously major impact, and as we learn ceph happens less frequently, and only 3 times in the first year of operation. It’s awesome tech, and I look forward to future enhancements in general. On Wed, Oct 26, 2022 at 3:41 AM William Edwards wrote: > > > Op 26 okt. 2022 om 10:11 heeft mj het volgende > geschreven: > > > > Hi! > > > > We have read https://docs.ceph.com/en/latest/man/8/mount.ceph, and > would like to see our expectations confirmed (or denied) here. :-) > > > > Suppose we build a three-node cluster, three monitors, three MDSs, etc, > in order to export a cephfs to multiple client nodes. > > > > On the (RHEL8) clients (web application servers) fstab, we will mount > the cephfs like: > > > >> cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph > name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2 > > > > We expect that the RHEL clients will then be able to use (read/write) a > shared /mnt/ha-pool directory simultaneously. > > > > Our question: how HA can we expect this setup to be? Looking for some > practical experience here. > > > > Specific: Can we reboot any of the three involved ceph servers without > the clients noticing anything? Or will there be certain timeouts involved, > during which /mnt/ha-pool/ will appear unresposive, and *after* a timeout > the client switches monitor node, and /mnt/ha-pool/ will respond again? > > Monitor failovers don’t cause a noticeable disruption IIRC. > > MDS failovers do. The MDS needs to replay. You can minimise the effect > with mds_standby_replay. > > > > > Of course we hope the answer is: in such a setup, cephfs clients should > not notice a reboot at all. :-) > > > > All the best! > > > > MJ > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs ha mount expectations
> Op 26 okt. 2022 om 10:11 heeft mj het volgende > geschreven: > > Hi! > > We have read https://docs.ceph.com/en/latest/man/8/mount.ceph, and would like > to see our expectations confirmed (or denied) here. :-) > > Suppose we build a three-node cluster, three monitors, three MDSs, etc, in > order to export a cephfs to multiple client nodes. > > On the (RHEL8) clients (web application servers) fstab, we will mount the > cephfs like: > >> cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph >> name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2 > > We expect that the RHEL clients will then be able to use (read/write) a > shared /mnt/ha-pool directory simultaneously. > > Our question: how HA can we expect this setup to be? Looking for some > practical experience here. > > Specific: Can we reboot any of the three involved ceph servers without the > clients noticing anything? Or will there be certain timeouts involved, during > which /mnt/ha-pool/ will appear unresposive, and *after* a timeout the client > switches monitor node, and /mnt/ha-pool/ will respond again? Monitor failovers don’t cause a noticeable disruption IIRC. MDS failovers do. The MDS needs to replay. You can minimise the effect with mds_standby_replay. > > Of course we hope the answer is: in such a setup, cephfs clients should not > notice a reboot at all. :-) > > All the best! > > MJ > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io