[ceph-users] Re: cephfs ha mount expectations

2022-10-27 Thread Eugen Block

Hi,

Thanks for the interesting discussion. Actually it's a bit  
disappointing to see that also cephfs with multiple MDS servers is  
not as HA as we would like it.


it really depends on what you're trying to achieve since there are  
lots of different scenarios how to setup and configure one or more  
ceph filesystems. And without testing your desired scenario you can't  
really say that it's disappointing. ;-)


I read also that filover time depends on the number of clients. We  
will only have three, and they will not do heavy IO. So that should  
perhaps help a bit.


In that case it's more likely that your clients won't notice a  
failover, but again, test it.


Is there any difference between an 'uncontrolled' ceph server  
(accidental) reboot, and a controlled reboot, where we (for example)  
first failover the MDS in a controlled, gentle way?


I haven't noticed a difference, but I'm still working with older  
clusters (mostly Nautilus), maybe in newer versions the failover is  
smoother, I can't tell yet.


Regards,
Eugen

Zitat von mj :


Hi all,

Thanks for the interesting discussion. Actually it's a bit  
disappointing to see that also cephfs with multiple MDS servers is  
not as HA as we would like it.


I read also that filover time depends on the number of clients. We  
will only have three, and they will not do heavy IO. So that should  
perhaps help a bit.


Is there any difference between an 'uncontrolled' ceph server  
(accidental) reboot, and a controlled reboot, where we (for example)  
first failover the MDS in a controlled, gentle way?


MJ

Op 26-10-2022 om 14:40 schreef Eugen Block:
Just one comment on the standby-replay setting: it really depends  
on the use-case, it can make things worse during failover. Just  
recently we had a customer where disabling standby-replay made  
failovers even faster and cleaner in a heavily used cluster. With  
standby-replay they had to manually clean things up in the mounted  
directory. So I would recommend to test both options.


Zitat von William Edwards :

Op 26 okt. 2022 om 10:11 heeft mj  het  
volgende geschreven:


Hi!

We have read https://docs.ceph.com/en/latest/man/8/mount.ceph,  
and would like to see our expectations confirmed (or denied)  
here. :-)


Suppose we build a three-node cluster, three monitors, three  
MDSs, etc, in order to export a cephfs to multiple client nodes.


On the (RHEL8) clients (web application servers) fstab, we will  
mount the cephfs like:


cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph  
name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2


We expect that the RHEL clients will then be able to use  
(read/write) a shared /mnt/ha-pool directory simultaneously.


Our question: how HA can we expect this setup to be? Looking for  
some practical experience here.


Specific: Can we reboot any of the three involved ceph servers  
without the clients noticing anything? Or will there be certain  
timeouts involved, during which /mnt/ha-pool/ will appear  
unresposive, and *after* a timeout the client switches monitor  
node, and /mnt/ha-pool/ will respond again?


Monitor failovers don’t cause a noticeable disruption IIRC.

MDS failovers do. The MDS needs to replay. You can minimise the  
effect with mds_standby_replay.




Of course we hope the answer is: in such a setup, cephfs clients  
should not notice a reboot at all. :-)


All the best!

MJ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs ha mount expectations

2022-10-26 Thread mj

Hi all,

Thanks for the interesting discussion. Actually it's a bit disappointing 
to see that also cephfs with multiple MDS servers is not as HA as we 
would like it.


I read also that filover time depends on the number of clients. We will 
only have three, and they will not do heavy IO. So that should perhaps 
help a bit.


Is there any difference between an 'uncontrolled' ceph server 
(accidental) reboot, and a controlled reboot, where we (for example) 
first failover the MDS in a controlled, gentle way?


MJ

Op 26-10-2022 om 14:40 schreef Eugen Block:
Just one comment on the standby-replay setting: it really depends on the 
use-case, it can make things worse during failover. Just recently we had 
a customer where disabling standby-replay made failovers even faster and 
cleaner in a heavily used cluster. With standby-replay they had to 
manually clean things up in the mounted directory. So I would recommend 
to test both options.


Zitat von William Edwards :

Op 26 okt. 2022 om 10:11 heeft mj  het volgende 
geschreven:


Hi!

We have read https://docs.ceph.com/en/latest/man/8/mount.ceph, and 
would like to see our expectations confirmed (or denied) here. :-)


Suppose we build a three-node cluster, three monitors, three MDSs, 
etc, in order to export a cephfs to multiple client nodes.


On the (RHEL8) clients (web application servers) fstab, we will mount 
the cephfs like:


cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph 
name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2


We expect that the RHEL clients will then be able to use (read/write) 
a shared /mnt/ha-pool directory simultaneously.


Our question: how HA can we expect this setup to be? Looking for some 
practical experience here.


Specific: Can we reboot any of the three involved ceph servers 
without the clients noticing anything? Or will there be certain 
timeouts involved, during which /mnt/ha-pool/ will appear 
unresposive, and *after* a timeout the client switches monitor node, 
and /mnt/ha-pool/ will respond again?


Monitor failovers don’t cause a noticeable disruption IIRC.

MDS failovers do. The MDS needs to replay. You can minimise the effect 
with mds_standby_replay.




Of course we hope the answer is: in such a setup, cephfs clients 
should not notice a reboot at all. :-)


All the best!

MJ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs ha mount expectations

2022-10-26 Thread Eugen Block
Just one comment on the standby-replay setting: it really depends on  
the use-case, it can make things worse during failover. Just recently  
we had a customer where disabling standby-replay made failovers even  
faster and cleaner in a heavily used cluster. With standby-replay they  
had to manually clean things up in the mounted directory. So I would  
recommend to test both options.


Zitat von William Edwards :

Op 26 okt. 2022 om 10:11 heeft mj  het  
volgende geschreven:


Hi!

We have read https://docs.ceph.com/en/latest/man/8/mount.ceph, and  
would like to see our expectations confirmed (or denied) here. :-)


Suppose we build a three-node cluster, three monitors, three MDSs,  
etc, in order to export a cephfs to multiple client nodes.


On the (RHEL8) clients (web application servers) fstab, we will  
mount the cephfs like:


cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph  
name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2


We expect that the RHEL clients will then be able to use  
(read/write) a shared /mnt/ha-pool directory simultaneously.


Our question: how HA can we expect this setup to be? Looking for  
some practical experience here.


Specific: Can we reboot any of the three involved ceph servers  
without the clients noticing anything? Or will there be certain  
timeouts involved, during which /mnt/ha-pool/ will appear  
unresposive, and *after* a timeout the client switches monitor  
node, and /mnt/ha-pool/ will respond again?


Monitor failovers don’t cause a noticeable disruption IIRC.

MDS failovers do. The MDS needs to replay. You can minimise the  
effect with mds_standby_replay.




Of course we hope the answer is: in such a setup, cephfs clients  
should not notice a reboot at all. :-)


All the best!

MJ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs ha mount expectations

2022-10-26 Thread Robert Gallop
I use this very example, few more servers.  I have no outage windows for my
ceph deployments as they support several production environments.

MDS is your focus, there are many knobs, but MDS is the key to client
experience.  In my environment, MDS failover takes 30-180 seconds,
depending on how much replay and rejoin needs to take place.  During this
failover I/O on the client is paused, but not broken. If you were to do an
ls at the time of failover, it may not return for a couple min worst case.
If a file transfer is ongoing it will stop writing for this failover time,
but both will complete after failover.

If I have MDs issues and failover for whatever reason is > 5 min, my
clients are lost.  I must reboot all clients tied to that MDS to recover
due to thousands of open files in various states.  This is obviously major
impact, and as we learn ceph happens less frequently, and only 3 times in
the first year of operation.

It’s awesome tech, and I look forward to future enhancements in general.

On Wed, Oct 26, 2022 at 3:41 AM William Edwards 
wrote:

>
> > Op 26 okt. 2022 om 10:11 heeft mj  het volgende
> geschreven:
> >
> > Hi!
> >
> > We have read https://docs.ceph.com/en/latest/man/8/mount.ceph, and
> would like to see our expectations confirmed (or denied) here. :-)
> >
> > Suppose we build a three-node cluster, three monitors, three MDSs, etc,
> in order to export a cephfs to multiple client nodes.
> >
> > On the (RHEL8) clients (web application servers) fstab, we will mount
> the cephfs like:
> >
> >> cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph
> name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2
> >
> > We expect that the RHEL clients will then be able to use (read/write) a
> shared /mnt/ha-pool directory simultaneously.
> >
> > Our question: how HA can we expect this setup to be? Looking for some
> practical experience here.
> >
> > Specific: Can we reboot any of the three involved ceph servers without
> the clients noticing anything? Or will there be certain timeouts involved,
> during which /mnt/ha-pool/ will appear unresposive, and *after* a timeout
> the client switches monitor node, and /mnt/ha-pool/ will respond again?
>
> Monitor failovers don’t cause a noticeable disruption IIRC.
>
> MDS failovers do. The MDS needs to replay. You can minimise the effect
> with mds_standby_replay.
>
> >
> > Of course we hope the answer is: in such a setup, cephfs clients should
> not notice a reboot at all. :-)
> >
> > All the best!
> >
> > MJ
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs ha mount expectations

2022-10-26 Thread William Edwards

> Op 26 okt. 2022 om 10:11 heeft mj  het volgende 
> geschreven:
> 
> Hi!
> 
> We have read https://docs.ceph.com/en/latest/man/8/mount.ceph, and would like 
> to see our expectations confirmed (or denied) here. :-)
> 
> Suppose we build a three-node cluster, three monitors, three MDSs, etc, in 
> order to export a cephfs to multiple client nodes.
> 
> On the (RHEL8) clients (web application servers) fstab, we will mount the 
> cephfs like:
> 
>> cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph 
>> name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2
> 
> We expect that the RHEL clients will then be able to use (read/write) a 
> shared /mnt/ha-pool directory simultaneously.
> 
> Our question: how HA can we expect this setup to be? Looking for some 
> practical experience here.
> 
> Specific: Can we reboot any of the three involved ceph servers without the 
> clients noticing anything? Or will there be certain timeouts involved, during 
> which /mnt/ha-pool/ will appear unresposive, and *after* a timeout the client 
> switches monitor node, and /mnt/ha-pool/ will respond again?

Monitor failovers don’t cause a noticeable disruption IIRC.

MDS failovers do. The MDS needs to replay. You can minimise the effect with 
mds_standby_replay.

> 
> Of course we hope the answer is: in such a setup, cephfs clients should not 
> notice a reboot at all. :-)
> 
> All the best!
> 
> MJ
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io