The changelog_mask has a default value. If you do changelog_mask='MARK MTIME CTIME'
you are setting the mask to this exact value, whereas changelog_mask='+SATTR' is keeping all the default flags plus adding SATTR. Thus the difference in output. AFAIK, both commands should work, so it feels like a bug. Looks like that some missing flag in the first case in causing some bugs, whereas in your second case almost all flags are enabled. You can try to bisect them to find out a smaller flag set that still work and report that in jira.whamcloud.com. Aurélien ________________________________ De : Philippe Dos Santos <[email protected]> Envoyé : jeudi 21 novembre 2024 11:18 À : Aurelien Degremont <[email protected]> Cc : [email protected] <[email protected]>; Philippe Weill <[email protected]> Objet : Re: [lustre-discuss] Report Strange Problem on 2.15.5 with changelog_mask External email: Use caution opening links or attachments Hello Aurelien, I'm working with Philippe WEILL and I'm Philippe too ;o) We first met the problem a few months ago. And it happened again yesterday after the maintenance window. On production we now have all servers and clients running Lustre 2.15.5. We reproduced the problem with 3 RockyLinux 8.10 VMs running Lustre 2.15.5 (1x mds-mgs, 2x oss and 1x client). We wonder if it's be related to a misuse of the changelog mask (='MARK MTIME CTIME' vs ='+MTIME +CTIME') ? ## Making the problem happen : [root@test-mds-mgs ~]# lctl set_param -P mdd.lustre-MDT0000.changelog_mask='MARK MTIME CTIME' [root@test-mds-mgs ~]# reboot [root@test-mds-mgs ~]# mount -t lustre /dev/sdb /mnt/mgt/ [root@test-mds-mgs ~]# mount -t lustre /dev/sdc /mnt/mdt/ [root@test-mds-mgs ~]# lctl get_param mdd.lustre-MDT0000.changelog_mask mdd.lustre-MDT0000.changelog_mask=MARK MTIME CTIME [root@test-rbh-cl-215 lustre]# LANG=C touch aeffacer touch: setting times of 'aeffacer': Input/output error [root@test-mds-mgs ~]# LANG=C dmesg -T ... [Thu Nov 21 10:54:24 2024] Lustre: Lustre: Build Version: 2.15.5 [Thu Nov 21 10:54:24 2024] LNet: Added LNI 172.20.240.172@tcp [8/256/0/180] [Thu Nov 21 10:54:24 2024] LNet: Accept secure, port 988 [Thu Nov 21 10:54:24 2024] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [Thu Nov 21 10:54:35 2024] LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [Thu Nov 21 10:54:35 2024] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 172.20.240.171@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. [Thu Nov 21 10:54:35 2024] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 300-900 [Thu Nov 21 10:54:35 2024] Lustre: lustre-MDD0000: changelog on [Thu Nov 21 10:55:26 2024] Lustre: lustre-MDT0000: Will be in recovery for at least 5:00, or until 1 client reconnects [Thu Nov 21 10:55:26 2024] Lustre: lustre-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted. [Thu Nov 21 10:55:26 2024] LustreError: 1907:0:(llog_cat.c:543:llog_cat_current_log()) lustre-MDD0000: next log does not exist! ... ## "Solving" the problem: [root@test-mds-mgs ~]# lctl set_param -P mdd.lustre-MDT0000.changelog_mask='+SATTR' [root@test-mds-mgs ~]# reboot [root@test-mds-mgs ~]# mount -t lustre /dev/sdb /mnt/mgt/ [root@test-mds-mgs ~]# mount -t lustre /dev/sdc /mnt/mdt/ [root@test-mds-mgs ~]# lctl get_param mdd.lustre-MDT0000.changelog_mask mdd.lustre-MDT0000.changelog_mask= MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT TRUNC SATTR XATTR HSM MTIME CTIME MIGRT FLRW RESYNC [root@test-rbh-cl-215 lustre]# touch aeffacer [root@test-rbh-cl-215 lustre]# ll aeffacer -rw-r--r-- 1 root root 0 21 nov. 11:03 aeffacer [root@test-mds-mgs ~]# LANG=C dmesg -T ... [Thu Nov 21 11:02:52 2024] Lustre: Lustre: Build Version: 2.15.5 [Thu Nov 21 11:02:52 2024] LNet: Added LNI 172.20.240.172@tcp [8/256/0/180] [Thu Nov 21 11:02:52 2024] LNet: Accept secure, port 988 [Thu Nov 21 11:02:53 2024] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [Thu Nov 21 11:02:57 2024] LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [Thu Nov 21 11:02:57 2024] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 300-900 [Thu Nov 21 11:02:57 2024] Lustre: lustre-MDD0000: changelog on [Thu Nov 21 11:03:27 2024] Lustre: lustre-MDT0000: Will be in recovery for at least 5:00, or until 1 client reconnects [Thu Nov 21 11:03:27 2024] Lustre: lustre-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted. Philippe ----- Mail original ----- De: "Philippe Weill" <[email protected]> À: "Aurelien Degremont" <[email protected]>, [email protected] Envoyé: Mercredi 20 Novembre 2024 17:44:16 Objet: Re: [lustre-discuss] Report Strange Problem on 2.15.5 with changelog_mask On 20/11/2024 16:24, Aurelien Degremont wrote: > Hello Philippe, > > I do not see why changing the changelog mask would cause I/O error, > especially as this seems transient. > Did you happen to have any errors on your client hosts or MDS hosts as the > time of your testing ? (see dmesg) hello no we did not see and we have reproduced the problem with 3 vm Rocky 8.10 with fresh 2.15.5 ( 1 mds , 1 oss , 1 client ) > > > Aurélien > ------------------------------------------------------------------------------------------------------------------------------------ > *De :* lustre-discuss <[email protected]> de la part de > Philippe Weill <[email protected]> > *Envoyé :* mercredi 20 novembre 2024 07:11 > *À :* [email protected] <[email protected]> > *Objet :* [lustre-discuss] Report Strange Problem on 2.15.5 with > changelog_mask > External email: Use caution opening links or attachments > > > Hello > > after passing the following command on our lustre MDS > > lctl set_param -P mdd.*-MDT0000.changelog_mask='MARK MTIME CTIME' > > unmounting and remounting the mdt on mds > > we had error on touch chmod chgrp existing files > > root@host:~# echo foobar > /scratch/root/foobar > root@host:~# cat /scratch/root/foobar > foobar > root@host:~# echo foobar2 >> /scratch/root/foobar > root@host:~# cat /scratch/root/foobar > foobar > foobar2 > root@host:~# touch /scratch/root/foobar > touch: setting times of '/scratch/root/foobar': Input/output error > root@host:~# chgrp group /scratch/root/foobar > chgrp: changing group of '/scratch/root/foobar': Input/output error > root@host:~# chmod 666 /scratch/root/foobar > chmod: changing permissions of '/scratch/root/foobar': Input/output error > > > doing the following command > > lctl set_param -P mdd.*-MDT0000.changelog_mask='-MARK -MTIME -CTIME' > > > and only activating non permanently for our robinhood > > lctl set_param mdd.*-MDT0000.changelog_mask='MARK MTIME CTIME' > > > [root@mds ~]# lctl get_param mdd.scratch-MDT0000.changelog_mask > mdd.scratch-MDT0000.changelog_mask=MARK MTIME CTIME > > > everything started to work again > > Bug or bad use from us ? > _______________________________________________ > lustre-discuss mailing list > [email protected] > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=05%7C02%7Cadegremont%40nvidia.com%7C9ebc0b0435e24da9446b08dd0a15ed9a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638677811643765311%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C40000%7C%7C%7C&sdata=nyYWcMtejqwdkaO%2BoF2AXvi0wjQLfjX7ihGl11Ol44Y%3D&reserved=0<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org> > > <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=05%7C02%7Cadegremont%40nvidia.com%7C9ebc0b0435e24da9446b08dd0a15ed9a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638677811643782865%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C40000%7C%7C%7C&sdata=%2Fs5ibjjz682sH9IFUpgnqMftgL%2FvujT37bebw8w8g6k%3D&reserved=0<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>> -- Weill Philippe - Administrateur Systeme et Reseaux CNRS/UPMC/IPSL LATMOS (UMR 8190) Tour 45/46 3e Etage B302|4 Place Jussieu|75252 Paris Cedex 05 - FRANCE Email:[email protected] | tel:+33 0144274759 _______________________________________________ lustre-discuss mailing list [email protected] https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=05%7C02%7Cadegremont%40nvidia.com%7C9ebc0b0435e24da9446b08dd0a15ed9a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638677811643794634%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C40000%7C%7C%7C&sdata=oe9Fq73BgDT6kg9tGstln7Ys%2FpSku3%2B%2B9SwLdBHS0QE%3D&reserved=0<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
