The changelog_mask has a default value. If you do

 changelog_mask='MARK MTIME CTIME'

you are setting the mask to this exact value, whereas

changelog_mask='+SATTR'

is keeping all the default flags plus adding SATTR. Thus the difference  in 
output.

AFAIK, both commands should work, so it feels like a bug. Looks like that some 
missing flag in the first case in causing some bugs, whereas in your second 
case almost all flags are enabled. You can try to bisect them to find out a 
smaller flag set that still work and report that in jira.whamcloud.com.

Aurélien

________________________________
De : Philippe Dos Santos <[email protected]>
Envoyé : jeudi 21 novembre 2024 11:18
À : Aurelien Degremont <[email protected]>
Cc : [email protected] <[email protected]>; 
Philippe Weill <[email protected]>
Objet : Re: [lustre-discuss] Report Strange Problem on 2.15.5 with 
changelog_mask

External email: Use caution opening links or attachments


Hello Aurelien,

I'm working with Philippe WEILL and I'm Philippe too ;o)

We first met the problem a few months ago.
And it happened again yesterday after the maintenance window.
On production we now have all servers and clients running Lustre 2.15.5.

We reproduced the problem with 3 RockyLinux 8.10 VMs running Lustre 2.15.5 (1x 
mds-mgs, 2x oss and 1x client).
We wonder if it's be related to a misuse of the changelog mask (='MARK MTIME 
CTIME' vs ='+MTIME +CTIME') ?

## Making the problem happen :

[root@test-mds-mgs ~]# lctl set_param -P 
mdd.lustre-MDT0000.changelog_mask='MARK MTIME CTIME'
[root@test-mds-mgs ~]# reboot
[root@test-mds-mgs ~]# mount -t lustre /dev/sdb /mnt/mgt/
[root@test-mds-mgs ~]# mount -t lustre /dev/sdc /mnt/mdt/
[root@test-mds-mgs ~]# lctl get_param mdd.lustre-MDT0000.changelog_mask
mdd.lustre-MDT0000.changelog_mask=MARK MTIME CTIME

[root@test-rbh-cl-215 lustre]# LANG=C touch aeffacer
touch: setting times of 'aeffacer': Input/output error

[root@test-mds-mgs ~]# LANG=C dmesg -T
...
[Thu Nov 21 10:54:24 2024] Lustre: Lustre: Build Version: 2.15.5
[Thu Nov 21 10:54:24 2024] LNet: Added LNI 172.20.240.172@tcp [8/256/0/180]
[Thu Nov 21 10:54:24 2024] LNet: Accept secure, port 988
[Thu Nov 21 10:54:24 2024] LDISKFS-fs (sdb): mounted filesystem with ordered 
data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[Thu Nov 21 10:54:35 2024] LDISKFS-fs (sdc): mounted filesystem with ordered 
data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[Thu Nov 21 10:54:35 2024] LustreError: 137-5: lustre-MDT0000_UUID: not 
available for connect from 172.20.240.171@tcp (no target). If you are running 
an HA pair check that the target is mounted on the other server.
[Thu Nov 21 10:54:35 2024] Lustre: lustre-MDT0000: Imperative Recovery not 
enabled, recovery window 300-900
[Thu Nov 21 10:54:35 2024] Lustre: lustre-MDD0000: changelog on
[Thu Nov 21 10:55:26 2024] Lustre: lustre-MDT0000: Will be in recovery for at 
least 5:00, or until 1 client reconnects
[Thu Nov 21 10:55:26 2024] Lustre: lustre-MDT0000: Recovery over after 0:01, of 
1 clients 1 recovered and 0 were evicted.
[Thu Nov 21 10:55:26 2024] LustreError: 
1907:0:(llog_cat.c:543:llog_cat_current_log()) lustre-MDD0000: next log does 
not exist!
...

## "Solving" the problem:

[root@test-mds-mgs ~]# lctl set_param -P 
mdd.lustre-MDT0000.changelog_mask='+SATTR'
[root@test-mds-mgs ~]# reboot
[root@test-mds-mgs ~]# mount -t lustre /dev/sdb /mnt/mgt/
[root@test-mds-mgs ~]# mount -t lustre /dev/sdc /mnt/mdt/
[root@test-mds-mgs ~]# lctl get_param mdd.lustre-MDT0000.changelog_mask
mdd.lustre-MDT0000.changelog_mask=
MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT TRUNC 
SATTR XATTR HSM MTIME CTIME MIGRT FLRW RESYNC

[root@test-rbh-cl-215 lustre]# touch aeffacer
[root@test-rbh-cl-215 lustre]# ll aeffacer
-rw-r--r-- 1 root root 0 21 nov.  11:03 aeffacer

[root@test-mds-mgs ~]# LANG=C dmesg -T
...
[Thu Nov 21 11:02:52 2024] Lustre: Lustre: Build Version: 2.15.5
[Thu Nov 21 11:02:52 2024] LNet: Added LNI 172.20.240.172@tcp [8/256/0/180]
[Thu Nov 21 11:02:52 2024] LNet: Accept secure, port 988
[Thu Nov 21 11:02:53 2024] LDISKFS-fs (sdb): mounted filesystem with ordered 
data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[Thu Nov 21 11:02:57 2024] LDISKFS-fs (sdc): mounted filesystem with ordered 
data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[Thu Nov 21 11:02:57 2024] Lustre: lustre-MDT0000: Imperative Recovery not 
enabled, recovery window 300-900
[Thu Nov 21 11:02:57 2024] Lustre: lustre-MDD0000: changelog on
[Thu Nov 21 11:03:27 2024] Lustre: lustre-MDT0000: Will be in recovery for at 
least 5:00, or until 1 client reconnects
[Thu Nov 21 11:03:27 2024] Lustre: lustre-MDT0000: Recovery over after 0:01, of 
1 clients 1 recovered and 0 were evicted.

Philippe


----- Mail original -----
De: "Philippe Weill" <[email protected]>
À: "Aurelien Degremont" <[email protected]>, [email protected]
Envoyé: Mercredi 20 Novembre 2024 17:44:16
Objet: Re: [lustre-discuss] Report Strange Problem on 2.15.5 with changelog_mask

On 20/11/2024 16:24, Aurelien Degremont wrote:
> Hello Philippe,
>
> I do not see why changing the changelog mask would cause I/O error, 
> especially as this seems transient.
> Did you happen to have any errors on your client hosts or MDS hosts as the 
> time of your testing ? (see dmesg)


hello

no we did not see and we have reproduced the problem with 3 vm Rocky 8.10 with 
fresh 2.15.5  ( 1 mds , 1 oss , 1 client )


>
>
> Aurélien
> ------------------------------------------------------------------------------------------------------------------------------------
> *De :* lustre-discuss <[email protected]> de la part de 
> Philippe Weill <[email protected]>
> *Envoyé :* mercredi 20 novembre 2024 07:11
> *À :* [email protected] <[email protected]>
> *Objet :* [lustre-discuss] Report Strange Problem on 2.15.5 with 
> changelog_mask
> External email: Use caution opening links or attachments
>
>
> Hello
>
> after passing the following command on our lustre MDS
>
> lctl set_param -P mdd.*-MDT0000.changelog_mask='MARK MTIME CTIME'
>
> unmounting and remounting the mdt on mds
>
> we had  error on touch chmod chgrp existing files
>
> root@host:~# echo foobar > /scratch/root/foobar
> root@host:~# cat /scratch/root/foobar
> foobar
> root@host:~# echo foobar2 >>  /scratch/root/foobar
> root@host:~# cat /scratch/root/foobar
> foobar
> foobar2
> root@host:~# touch /scratch/root/foobar
> touch: setting times of '/scratch/root/foobar': Input/output error
> root@host:~# chgrp group /scratch/root/foobar
> chgrp: changing group of '/scratch/root/foobar': Input/output error
> root@host:~# chmod 666 /scratch/root/foobar
> chmod: changing permissions of '/scratch/root/foobar': Input/output error
>
>
> doing the following command
>
> lctl set_param -P mdd.*-MDT0000.changelog_mask='-MARK -MTIME -CTIME'
>
>
> and only activating non permanently for our robinhood
>
> lctl set_param  mdd.*-MDT0000.changelog_mask='MARK MTIME CTIME'
>
>
> [root@mds ~]#  lctl get_param  mdd.scratch-MDT0000.changelog_mask
> mdd.scratch-MDT0000.changelog_mask=MARK MTIME CTIME
>
>
> everything started to work again
>
> Bug or bad use from us ?
> _______________________________________________
> lustre-discuss mailing list
> [email protected]
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=05%7C02%7Cadegremont%40nvidia.com%7C9ebc0b0435e24da9446b08dd0a15ed9a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638677811643765311%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C40000%7C%7C%7C&sdata=nyYWcMtejqwdkaO%2BoF2AXvi0wjQLfjX7ihGl11Ol44Y%3D&reserved=0<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>  
> <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=05%7C02%7Cadegremont%40nvidia.com%7C9ebc0b0435e24da9446b08dd0a15ed9a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638677811643782865%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C40000%7C%7C%7C&sdata=%2Fs5ibjjz682sH9IFUpgnqMftgL%2FvujT37bebw8w8g6k%3D&reserved=0<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>>

--
Weill Philippe -  Administrateur Systeme et Reseaux
CNRS/UPMC/IPSL   LATMOS (UMR 8190)
Tour 45/46 3e Etage B302|4 Place Jussieu|75252 Paris Cedex 05 -  FRANCE
Email:[email protected] | tel:+33 0144274759
_______________________________________________
lustre-discuss mailing list
[email protected]
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=05%7C02%7Cadegremont%40nvidia.com%7C9ebc0b0435e24da9446b08dd0a15ed9a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638677811643794634%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C40000%7C%7C%7C&sdata=oe9Fq73BgDT6kg9tGstln7Ys%2FpSku3%2B%2B9SwLdBHS0QE%3D&reserved=0<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to