Hi Xiubo, great! I'm not sure if we observed this particular issue, but we did have the oldest_client_tid updates not advancing message in a context that might re related.
If this fix is not too large, it would be really great if it could be included in the last Pacific point release. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Xiubo Li <xiu...@redhat.com> Sent: Wednesday, November 8, 2023 1:38 AM To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS stuck in rejoin Hi Frank, Recently I found a new possible case could cause this, please see https://github.com/ceph/ceph/pull/54259. This is just a ceph side fix, after this we need to fix it in kclient too, which hasn't done yet. Thanks - Xiubo On 8/8/23 17:44, Frank Schilder wrote: > Dear Xiubo, > > the nearfull pool is an RBD pool and has nothing to do with the file system. > All pools for the file system have plenty of capacity. > > I think we have an idea what kind of workload caused the issue. We had a user > run a computation that reads the same file over and over again. He started > 100 such jobs in parallel and our storage servers were at 400% load. I saw > 167K read IOP/s on an HDD pool that has an aggregated raw IOP/s budget of ca. > 11K. Clearly, most of this was served from RAM. > > It is possible that this extreme load situation triggered a race that > remained undetected/unreported. There is literally no related message in any > logs near the time the warning started popping up. It shows up out of nowhere. > > We asked the user to change his workflow to use local RAM disk for the input > files. I don't think we can reproduce the problem anytime soon. > > About the bug fixes, I'm eagerly waiting for this and another one. Any idea > when they might show up in distro kernels? > > Thanks and best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Xiubo Li <xiu...@redhat.com> > Sent: Tuesday, August 8, 2023 2:57 AM > To: Frank Schilder; ceph-users@ceph.io > Subject: Re: [ceph-users] Re: MDS stuck in rejoin > > > On 8/7/23 21:54, Frank Schilder wrote: >> Dear Xiubo, >> >> I managed to collect some information. It looks like there is nothing in the >> dmesg log around the time the client failed to advance its TID. I collected >> short snippets around the critical time below. I have full logs in case you >> are interested. Its large files, I will need to do an upload for that. >> >> I also have a dump of "mds session ls" output for clients that showed the >> same issue later. Unfortunately, no consistent log information for a single >> incident. >> >> Here the summary, please let me know if uploading the full package makes >> sense: >> >> - Status: >> >> On July 29, 2023 >> >> ceph status/df/pool stats/health detail at 01:05:14: >> cluster: >> health: HEALTH_WARN >> 1 pools nearfull >> >> ceph status/df/pool stats/health detail at 01:05:28: >> cluster: >> health: HEALTH_WARN >> 1 clients failing to advance oldest client/flush tid >> 1 pools nearfull > Okay, then this could be the root cause. > > If the pool nearful it could block flushing the journal logs to the pool > and then the MDS couldn't safe reply to the requests and then block them > like this. > > Could you fix the pool nearful issue first and then check could you see > it again ? > > >> [...] >> >> On July 31, 2023 >> >> ceph status/df/pool stats/health detail at 10:36:16: >> cluster: >> health: HEALTH_WARN >> 1 clients failing to advance oldest client/flush tid >> 1 pools nearfull >> >> cluster: >> health: HEALTH_WARN >> 1 pools nearfull >> >> - client evict command (date, time, command): >> >> 2023-07-31 10:36 ceph tell mds.ceph-11 client evict id=145678457 >> >> We have a 1h time difference between the date stamp of the command and the >> dmesg date stamps. However, there seems to be a weird 10min delay from >> issuing the evict command until it shows up in dmesg on the client. >> >> - dmesg: >> >> [Fri Jul 28 12:59:14 2023] beegfs: enabling unsafe global rkey >> [Fri Jul 28 12:59:14 2023] beegfs: enabling unsafe global rkey >> [Fri Jul 28 12:59:14 2023] beegfs: enabling unsafe global rkey >> [Fri Jul 28 12:59:14 2023] beegfs: enabling unsafe global rkey >> [Fri Jul 28 12:59:14 2023] beegfs: enabling unsafe global rkey >> [Fri Jul 28 12:59:14 2023] beegfs: enabling unsafe global rkey >> [Fri Jul 28 16:07:47 2023] slurm.epilog.cl (24175): drop_caches: 3 >> [Sat Jul 29 18:21:30 2023] libceph: mds2 192.168.32.75:6801 socket closed >> (con state OPEN) >> [Sat Jul 29 18:21:30 2023] libceph: mds2 192.168.32.75:6801 socket closed >> (con state OPEN) >> [Sat Jul 29 18:21:30 2023] libceph: mds2 192.168.32.75:6801 socket closed >> (con state OPEN) >> [Sat Jul 29 18:21:42 2023] ceph: mds2 reconnect start >> [Sat Jul 29 18:21:42 2023] ceph: mds2 reconnect start >> [Sat Jul 29 18:21:43 2023] ceph: mds2 reconnect start >> [Sat Jul 29 18:21:43 2023] ceph: mds2 reconnect success >> [Sat Jul 29 18:21:43 2023] ceph: mds2 reconnect success >> [Sat Jul 29 18:21:43 2023] ceph: mds2 reconnect success >> [Sat Jul 29 18:26:39 2023] ceph: mds2 reconnect start >> [Sat Jul 29 18:26:39 2023] ceph: mds2 reconnect start >> [Sat Jul 29 18:26:39 2023] ceph: mds2 reconnect start >> [Sat Jul 29 18:26:40 2023] ceph: mds2 reconnect success >> [Sat Jul 29 18:26:40 2023] ceph: mds2 reconnect success >> [Sat Jul 29 18:26:40 2023] ceph: mds2 reconnect success >> [Sat Jul 29 18:26:49 2023] ceph: update_snap_trace error -22 > This is a known bug and we have fixed it in both kclient and ceph side: > > https://tracker.ceph.com/issues/61200 > > https://tracker.ceph.com/issues/61217 > > Thanks > > - Xiubo > > >> [Sat Jul 29 18:26:49 2023] ceph: mds2 recovery completed >> [Sat Jul 29 18:26:49 2023] ceph: mds2 recovery completed >> [Sat Jul 29 18:26:49 2023] ceph: mds2 recovery completed >> [Sun Jul 30 16:37:55 2023] slurm.epilog.cl (43668): drop_caches: 3 >> [Mon Jul 31 01:00:20 2023] slurm.epilog.cl (73347): drop_caches: 3 >> [Mon Jul 31 09:46:41 2023] libceph: mds0 192.168.32.81:6801 socket closed >> (con state OPEN) >> [Mon Jul 31 09:46:41 2023] libceph: mds3 192.168.32.87:6801 socket closed >> (con state OPEN) >> [Mon Jul 31 09:46:41 2023] libceph: mds7 192.168.32.88:6801 socket closed >> (con state OPEN) >> [Mon Jul 31 09:46:41 2023] libceph: mds5 192.168.32.78:6801 socket closed >> (con state OPEN) >> [Mon Jul 31 09:46:41 2023] libceph: mds4 192.168.32.73:6801 socket closed >> (con state OPEN) >> [Mon Jul 31 09:46:41 2023] libceph: mds1 192.168.32.80:6801 socket closed >> (con state OPEN) >> [Mon Jul 31 09:46:41 2023] libceph: mds2 192.168.32.75:6801 socket closed >> (con state OPEN) >> [Mon Jul 31 09:46:41 2023] libceph: mds3 192.168.32.87:6801 connection reset >> [Mon Jul 31 09:46:41 2023] libceph: reset on mds3 >> [Mon Jul 31 09:46:41 2023] ceph: mds3 closed our session >> [Mon Jul 31 09:46:41 2023] ceph: mds3 reconnect start >> [Mon Jul 31 09:46:41 2023] libceph: mds7 192.168.32.88:6801 connection reset >> [Mon Jul 31 09:46:41 2023] libceph: reset on mds7 >> [Mon Jul 31 09:46:41 2023] ceph: mds7 closed our session >> [Mon Jul 31 09:46:41 2023] ceph: mds7 reconnect start >> [Mon Jul 31 09:46:41 2023] libceph: mds2 192.168.32.75:6801 connection reset >> [Mon Jul 31 09:46:41 2023] libceph: reset on mds2 >> [Mon Jul 31 09:46:41 2023] ceph: mds2 closed our session >> [Mon Jul 31 09:46:41 2023] ceph: mds2 reconnect start >> [Mon Jul 31 09:46:41 2023] libceph: mds4 192.168.32.73:6801 connection reset >> [Mon Jul 31 09:46:41 2023] libceph: reset on mds4 >> [Mon Jul 31 09:46:41 2023] ceph: mds4 closed our session >> [Mon Jul 31 09:46:41 2023] ceph: mds4 reconnect start >> [Mon Jul 31 09:46:41 2023] libceph: mds1 192.168.32.80:6801 connection reset >> [Mon Jul 31 09:46:41 2023] libceph: reset on mds1 >> [Mon Jul 31 09:46:41 2023] ceph: mds1 closed our session >> [Mon Jul 31 09:46:41 2023] ceph: mds1 reconnect start >> [Mon Jul 31 09:46:41 2023] libceph: mds0 192.168.32.81:6801 connection reset >> [Mon Jul 31 09:46:41 2023] libceph: reset on mds0 >> [Mon Jul 31 09:46:41 2023] ceph: mds0 closed our session >> [Mon Jul 31 09:46:41 2023] ceph: mds0 reconnect start >> [Mon Jul 31 09:46:41 2023] libceph: mds5 192.168.32.78:6801 connection reset >> [Mon Jul 31 09:46:41 2023] libceph: reset on mds5 >> [Mon Jul 31 09:46:41 2023] ceph: mds5 closed our session >> [Mon Jul 31 09:46:41 2023] ceph: mds5 reconnect start >> [Mon Jul 31 09:46:41 2023] ceph: mds2 reconnect denied >> [Mon Jul 31 09:46:41 2023] ceph: mds1 reconnect denied >> [Mon Jul 31 09:46:41 2023] ceph: mds0 reconnect denied >> [Mon Jul 31 09:46:41 2023] ceph: mds5 reconnect denied >> [Mon Jul 31 09:46:41 2023] ceph: mds3 reconnect denied >> [Mon Jul 31 09:46:41 2023] ceph: mds7 reconnect denied >> [Mon Jul 31 09:46:41 2023] ceph: mds4 reconnect denied >> >> Best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >> ________________________________________ >> From: Xiubo Li <xiu...@redhat.com> >> Sent: Monday, July 31, 2023 12:14 PM >> To: Frank Schilder; ceph-users@ceph.io >> Subject: Re: [ceph-users] Re: MDS stuck in rejoin >> >> >> On 7/31/23 16:50, Frank Schilder wrote: >>> Hi Xiubo, >>> >>> its a kernel client. I actually made a mistake when trying to evict the >>> client and my command didn't do anything. I did another evict and this time >>> the client IP showed up in the blacklist. Furthermore, the warning >>> disappeared. I asked for the dmesg logs from the client node. >> Yeah, after the client's sessions are closed the corresponding warning >> should be cleared. >> >> Thanks >> >>> Best regards, >>> ================= >>> Frank Schilder >>> AIT Risø Campus >>> Bygning 109, rum S14 >>> >>> ________________________________________ >>> From: Xiubo Li <xiu...@redhat.com> >>> Sent: Monday, July 31, 2023 4:12 AM >>> To: Frank Schilder; ceph-users@ceph.io >>> Subject: Re: [ceph-users] Re: MDS stuck in rejoin >>> >>> Hi Frank, >>> >>> On 7/30/23 16:52, Frank Schilder wrote: >>>> Hi Xiubo, >>>> >>>> it happened again. This time, we might be able to pull logs from the >>>> client node. Please take a look at my intermediate action below - thanks! >>>> >>>> I am in a bit of a calamity, I'm on holidays with terrible network >>>> connection and can't do much. My first priority is securing the cluster to >>>> avoid damage caused by this issue. I did an MDS evict by client ID on the >>>> MDS reporting the warning with the client ID reported in the warning. For >>>> some reason the client got blocked on 2 MDSes after this command, one of >>>> these is an ordinary stand-by daemon. Not sure if this is expected. >>>> >>>> Main question: is this sufficient to prevent any damaging IO on the >>>> cluster? I'm thinking here about the MDS eating through all its RAM until >>>> it crashes hard in an irrecoverable state (that was described as a >>>> consequence in an old post about this warning). If this is a safe state, I >>>> can keep it in this state until I return from holidays. >>> Yeah, I think so. >>> >>> BTW, are u using the kclients or user space clients ? I checked both >>> kclient and libcephfs, it seems buggy in libcephfs, which could cause >>> this issue. But for kclient it's okay till now. >>> >>> Thanks >>> >>> - Xiubo >>> >>>> Best regards, >>>> ================= >>>> Frank Schilder >>>> AIT Risø Campus >>>> Bygning 109, rum S14 >>>> >>>> ________________________________________ >>>> From: Xiubo Li <xiu...@redhat.com> >>>> Sent: Friday, July 28, 2023 11:37 AM >>>> To: Frank Schilder; ceph-users@ceph.io >>>> Subject: Re: [ceph-users] Re: MDS stuck in rejoin >>>> >>>> >>>> On 7/26/23 22:13, Frank Schilder wrote: >>>>> Hi Xiubo. >>>>> >>>>>> ... I am more interested in the kclient side logs. Just want to >>>>>> know why that oldest request got stuck so long. >>>>> I'm afraid I'm a bad admin in this case. I don't have logs from the host >>>>> any more, I would have needed the output of dmesg and this is gone. In >>>>> case it happens again I will try to pull the info out. >>>>> >>>>> The tracker https://tracker.ceph.com/issues/22885 sounds a lot more >>>>> violent than our situation. We had no problems with the MDSes, the cache >>>>> didn't grow and the relevant one was also not put into read-only mode. It >>>>> was just this warning showing all the time, health was OK otherwise. I >>>>> think the warning was there for at least 16h before I failed the MDS. >>>>> >>>>> The MDS log contains nothing, this is the only line mentioning this >>>>> client: >>>>> >>>>> 2023-07-20T00:22:05.518+0200 7fe13df59700 0 log_channel(cluster) log >>>>> [WRN] : client.145678382 does not advance its oldest_client_tid >>>>> (16121616), 100000 completed requests recorded in session >>>> Okay, if so it's hard to say and dig out what has happened in client why >>>> it didn't advance the tid. >>>> >>>> Thanks >>>> >>>> - Xiubo >>>> >>>> >>>>> Best regards, >>>>> ================= >>>>> Frank Schilder >>>>> AIT Risø Campus >>>>> Bygning 109, rum S14 >>>>> >>>>> _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io