[ceph-users] Re: Temporary shutdown of subcluster and cephfs

Frank Schilder Wed, 19 Oct 2022 04:54:13 -0700

Hi Dan,

I know that "fs fail ..." is not ideal, but we will not have time for a clean 
"fs down true" and wait for journal flush procedure to complete (on our cluster 
this takes at least 20 minutes, which is way too long). My question is more 
along the lines 'Is an "fs fail" destructive?', that is, will an FS come up 
again after

- fs fail
...
- fs set <fs_name> joinable true

The alternative is just a power-off without consideration to anything. Of 
course we try to get many FS clients unmounted before that, but there is no 
time to wait for anything that takes too long. I need a fast (unclean yet 
recoverable) procedure. Maybe data in flight gets lost, but the FS itself must 
come up healthy again.

Any hints on how to do this? Also for the MON store log size problem?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <[email protected]>
Sent: 19 October 2022 13:27:11
To: Frank Schilder
Cc: [email protected]
Subject: Re: [ceph-users] Temporary shutdown of subcluster and cephfs

Hi Frank,

fs fail isn't ideal -- there's an 'fs down' command for this.

Here's a procedure we used, last used in the nautilus days:

1. If possible, umount fs from all the clients, so that all dirty
pages are flushed.
2. Prepare the ceph cluster: ceph osd set noout/noin
3. Wait until there is zero IO on the cluster, unmount any leftover clients.
4. ceph fs set cephfs down true
5. Stop all the ceph-osd's.
6. Power off the cluster.
(At this point we had only the ceph-mon's ceph-mgr's running -- you
can shut those down too).
7. Power on the cluster, wait for mon/mgr/osds/mds to power-up.
8. ceph fs set cephfs down false
9. Reconnect and test clients.
10. ceph osd unset noout/noin

-- Dan

On Wed, Oct 19, 2022 at 12:43 PM Frank Schilder <[email protected]> wrote:
>
> Hi all,
>
> we need to prepare for temporary shut-downs of a part of our ceph cluster. I 
> have 2 questions:
>
> 1) What is the recommended procedure to temporarily shut down a ceph fs 
> quickly?
> 2) How to avoid MON store log spam overflow (on octopus 15.2.17)?
>
> To 1: Currently, I'm thinking about:
>
> - fs fail <fs-name>
> - shut down all MDS daemons
> - shut down all OSDs in that sub-cluster
> - shut down MGRs and MONs in that sub-cluster
> - power servers down
> - mark out OSDs manually (the number will exceed the MON limit for auto-out)
>
> - power up
> - wait a bit
> - do I need to mark OSDs in again or will they join automatically after 
> manual out and restart (maybe just temporarily increase the MON limit at end 
> of procedure above)?
> - fs set <fs_name> joinable true
>
> Is this a safe procedure? The documentation calls this a procedure for 
> "Taking the cluster down rapidly for deletion or disaster recovery", neither 
> of the two is our intent. We need to have a fast *reversable* procedure, 
> because an "fs set down true" simply takes too long.
>
> There will be ceph fs clients remaining up. Desired behaviour is that 
> client-IO stalls until fs comes back up and then just continues as if nothing 
> had happened.
>
> To 2: We will have a sub-cluster down for an extended period of time. There 
> have been cases where such a situation killed MONS due to excessive amount of 
> non-essential logs accumulating in the MON store. Is this still a problem 
> with 15.2.17 and what can I do to reduce this problem?
>
> Thanks for any hints/corrections/confirmations!
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Temporary shutdown of subcluster and cephfs

Reply via email to