Hi Hector,
thank you very much for the detailed explanation and link to the
documentation.
Given our current situation (7 active MDSs and 1 standby MDS):
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active icadmin012 Reqs: 82 /s 2345k 2288k 97.2k 307k
1 active icadmin008 Reqs: 194 /s 3789k 3789k 17.1k 641k
2 active icadmin007 Reqs: 94 /s 5823k 5369k 150k 257k
3 active icadmin014 Reqs: 103 /s 813k 796k 47.4k 163k
4 active icadmin013 Reqs: 81 /s 3815k 3798k 12.9k 186k
5 active icadmin011 Reqs: 84 /s 493k 489k 9145 176k
6 active icadmin015 Reqs: 374 /s 1741k 1669k 28.1k 246k
POOL TYPE USED AVAIL
cephfs_metadata metadata 8547G 25.2T
cephfs_data data 223T 25.2T
STANDBY MDS
icadmin006
I would probably be better off having:
1. having only 3 active MDSs (rank 0 to 2)
2. configure 3 standby-replay to mirror the ranks 0 to 2
3. have 2 'regular' standby MDSs
Of course, this raises the question of storage and performance.
Since I would be moving from 7 active MDSs to 3:
1. each new active MDS will have to store more than twice the data
2. the load will be more than twice as high
Am I correct?
Emmanuel
On Wed, May 24, 2023 at 2:31 PM Hector Martin <[email protected]> wrote:
> On 24/05/2023 21.15, Emmanuel Jaep wrote:
> > Hi,
> >
> > we are currently running a ceph fs cluster at the following version:
> > MDS version: ceph version 16.2.10
> > (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
> >
> > The cluster is composed of 7 active MDSs and 1 standby MDS:
> > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
> > 0 active icadmin012 Reqs: 73 /s 1938k 1880k 85.3k 92.8k
> > 1 active icadmin008 Reqs: 206 /s 2375k 2375k 7081 171k
> > 2 active icadmin007 Reqs: 91 /s 5709k 5256k 149k 299k
> > 3 active icadmin014 Reqs: 93 /s 679k 664k 40.1k 216k
> > 4 active icadmin013 Reqs: 86 /s 3585k 3569k 12.7k 197k
> > 5 active icadmin011 Reqs: 72 /s 225k 221k 8611 164k
> > 6 active icadmin015 Reqs: 87 /s 1682k 1610k 27.9k 274k
> > POOL TYPE USED AVAIL
> > cephfs_metadata metadata 8552G 22.3T
> > cephfs_data data 226T 22.3T
> > STANDBY MDS
> > icadmin006
> >
> > When I restart one of the active MDSs, the standby MDS becomes active and
> > its state becomes "replay". So far, so good!
> >
> > However, only one of the other "active" MDSs seems to remain active. All
> > activities drop from the other ones:
> > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
> > 0 active icadmin012 Reqs: 0 /s 1938k 1881k 85.3k 9720
> > 1 active icadmin008 Reqs: 0 /s 2375k 2375k 7080 2505
> > 2 active icadmin007 Reqs: 2 /s 5709k 5256k 149k 26.5k
> > 3 active icadmin014 Reqs: 0 /s 679k 664k 40.1k 3259
> > 4 replay icadmin006 801k 801k 1279 0
> > 5 active icadmin011 Reqs: 0 /s 225k 221k 8611 9241
> > 6 active icadmin015 Reqs: 0 /s 1682k 1610k 27.9k 34.8k
> > POOL TYPE USED AVAIL
> > cephfs_metadata metadata 8539G 22.8T
> > cephfs_data data 225T 22.8T
> > STANDBY MDS
> > icadmin013
> >
> > In effect, the cluster becomes almost unavailable until the newly
> promoted
> > MDS finishes rejoining the cluster.
> >
> > Obviously, this defeats the purpose of having 7MDSs.
> > Is this behavior?
> > If not, what configuration items should I check to go back to "normal"
> > operations?
> >
>
> Please ignore my previous email, I read too quickly. I see you do have a
> standby. However, that does not allow fast failover with multiple MDSes.
>
> For fast failover of any active MDS, you need one standby-replay daemon
> for *each* active MDS. Each standby-replay MDS follows one active MDS's
> rank only, you can't have one standby-replay daemon following all ranks.
> What you have right now is probably a regular standby daemon, which can
> take over any failed MDS, but requires waiting for the replay time.
>
> See:
>
> https://docs.ceph.com/en/latest/cephfs/standby/#configuring-standby-replay
>
> My explanation for the zero ops from the previous email still holds:
> it's likely that most clients will hang if any MDS rank is
> down/unavailable.
>
> - Hector
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]