Nathan Rutman wrote: > Wojciech Turek wrote: > >> On 7 Nov 2007, at 22:31, Nathan Rutman wrote: >> >> >>> Cliff White wrote: >>> >>>> Wojciech Turek wrote: >>>> >>>> >>>> >>>> >>>>> Hi Cliff, >>>>> >>>>> On 7 Nov 2007, at 17:58, Cliff White wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Wojciech Turek wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> Our lustre environment is: >>>>>>> 2.6.9-55.0.9.EL_lustre.1.6.3smp >>>>>>> I would like to change recovery timeout from default value 250s >>>>>>> to something longer >>>>>>> I tried example from manual: >>>>>>> set_timeout <secs> Sets the timeout (obd_timeout) for a server >>>>>>> to wait before failing recovery. >>>>>>> We performed that experiment on our test lustre installation with >>>>>>> one OST. >>>>>>> storage02 is our OSS >>>>>>> [EMAIL PROTECTED] ~]# lctl dl >>>>>>> 0 UP mgc [EMAIL PROTECTED] 31259d9b-e655-cdc4-c760-45d3df426d86 5 >>>>>>> 1 UP ost OSS OSS_uuid 3 >>>>>>> 2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7 >>>>>>> [EMAIL PROTECTED] ~]# lctl --device 2 set_timeout 600 >>>>>>> set_timeout has been deprecated. Use conf_param instead. >>>>>>> e.g. conf_param lustre-MDT0000 obd_timeout=50 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>> sorry about this bad help message. It's wrong. >>> >>>>>>> usage: conf_param obd_timeout=<secs> >>>>>>> run <command> after connecting to device <devno> >>>>>>> --device <devno> <command [args ...]> >>>>>>> [EMAIL PROTECTED] ~]# lctl --device 1 conf_param obd_timeout=600 >>>>>>> No device found for name MGS: Invalid argument >>>>>>> error: conf_param: No such device >>>>>>> It looks like I need to run this command from MGS node so I >>>>>>> moved then to MGS server called storage03 >>>>>>> [EMAIL PROTECTED] ~]# lctl dl >>>>>>> 0 UP mgs MGS MGS 9 >>>>>>> 1 UP mgc [EMAIL PROTECTED] f51a910b-a08e-4be6-5ada-b602a5ca9ab3 5 >>>>>>> 2 UP mdt MDS MDS_uuid 3 >>>>>>> 3 UP lov home-md-mdtlov home-md-mdtlov_UUID 4 >>>>>>> 4 UP mds home-md-MDT0000 home-md-MDT0000_UUID 5 >>>>>>> 5 UP osc home-md-OST0001-osc home-md-mdtlov_UUID 5 >>>>>>> [EMAIL PROTECTED] ~]# lctl device 5 >>>>>>> [EMAIL PROTECTED] ~]# lctl conf_param obd_timeout=600 >>>>>>> error: conf_param: Function not implemented >>>>>>> [EMAIL PROTECTED] ~]# lctl --device 5 conf_param obd_timeout=600 >>>>>>> error: conf_param: Function not implemented >>>>>>> [EMAIL PROTECTED] ~]# lctl help conf_param >>>>>>> conf_param: set a permanent config param. This command must be >>>>>>> run on the MGS node >>>>>>> usage: conf_param <target.keyword=val> ... >>>>>>> [EMAIL PROTECTED] ~]# lctl conf_param home-md-MDT0000.obd_timeout=600 >>>>>>> error: conf_param: Invalid argument >>>>>>> [EMAIL PROTECTED] ~]# >>>>>>> I searched whole /proc/*/lustre for file that can store this >>>>>>> timeout value but nothing were found. >>>>>>> Could someone advise how to change value for recovery timeout? >>>>>>> Cheers, >>>>>>> Wojciech Turek >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> It looks like your file system is named 'home' - you can confirm with >>>>>> tunefs.lustre --print <MDS device> | grep "Lustre FS" >>>>>> >>>>>> The correct command (Run on the MGS) would be >>>>>> # lctl conf_param home.sys.timeout=<val> >>>>>> >>>>>> Example: >>>>>> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/sdb |grep "Lustre FS" >>>>>> Lustre FS: lustre >>>>>> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >>>>>> 130 >>>>>> [EMAIL PROTECTED] ~]# lctl conf_param lustre.sys.timeout=150 >>>>>> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >>>>>> 150 >>>>>> >>>>>> >>>>>> >>>>>> >>>>> Thanks for your email. I am afraid your tips aren't very helpful in >>>>> this case. As stated in the subject I am asking about recovery timeout. >>>>> You can find it for example in >>>>> /proc/fs/lustre/obdfilter/<OST>/recovery_status whilst one of your >>>>> OST's is in recovery state. By default this timeout is 250s. >>>>> Whereas you are talking about system obd timeout (according to CFS >>>>> documentation chapter 4.1.2 ) which is not a subject of my concern. >>>>> >>>>> Any way I tried your example just to see if it works and again I am >>>>> afraid it doesn't work for me, see below: >>>>> I have combined mgs and mds configuration. >>>>> >>>>> [EMAIL PROTECTED] ~]# df >>>>> Filesystem 1K-blocks Used Available Use% Mounted on >>>>> /dev/sda1 10317828 3452824 6340888 36% / >>>>> /dev/sda6 7605856 49788 7169708 1% /local >>>>> /dev/sda3 4127108 41000 3876460 2% /tmp >>>>> /dev/sda2 4127108 753668 3163792 20% /var >>>>> /dev/dm-2 1845747840 447502120 1398245720 25% /mnt/sdb >>>>> /dev/dm-1 6140723200 4632947344 1507775856 76% /mnt/sdc >>>>> /dev/dm-3 286696376 1461588 268850900 1% >>>>> /mnt/home-md/mdt >>>>> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-3 |grep "Lustre FS" >>>>> Lustre FS: home-md >>>>> Lustre FS: home-md >>>>> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >>>>> 100 >>>>> [EMAIL PROTECTED] ~]# lctl conf_param home-md.sys.timeout=150 >>>>> error: conf_param: Invalid argument >>>>> [EMAIL PROTECTED] ~]# >>>>> >>>>> >>>>> >>>>> >>> You need to do this on the MGS node, with the MGS running. >>> >>> mgs> lctl conf_param testfs.sys.timeout=150 >>> anynode> cat /proc/sys/lustre/timeout >>> >> This isn't working for me. In my production configuration I have MGS >> combined with MDT on the same server. My lustre configuration consists >> of two file systems. >> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-0 >> checking for existing Lustre data: found CONFIGS/mountdata >> Reading CONFIGS/mountdata >> >> Read previous values: >> Target: ddn-home-MDT0000 >> Index: 0 >> Lustre FS: ddn-home >> Mount type: ldiskfs >> Flags: 0x5 >> (MDT MGS ) >> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >> Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] >> >> >> Permanent disk data: >> Target: ddn-home-MDT0000 >> Index: 0 >> Lustre FS: ddn-home >> Mount type: ldiskfs >> Flags: 0x5 >> (MDT MGS ) >> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >> Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] >> >> exiting before disk write. >> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-1 >> checking for existing Lustre data: found CONFIGS/mountdata >> Reading CONFIGS/mountdata >> >> Read previous values: >> Target: ddn-data-MDT0000 >> Index: 0 >> Lustre FS: ddn-data >> Mount type: ldiskfs >> Flags: 0x1 >> (MDT ) >> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >> Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] >> >> >> Permanent disk data: >> Target: ddn-data-MDT0000 >> Index: 0 >> Lustre FS: ddn-data >> Mount type: ldiskfs >> Flags: 0x1 >> (MDT ) >> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >> Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] >> >> exiting before disk write. >> [EMAIL PROTECTED] ~]# >> >> As you can see above MGS is on /dev/dm-0 combined with MDT for >> ddn-home file system. >> If I try command line from your example I get this: >> [EMAIL PROTECTED] ~]# lctl conf_param ddn-home.sys.timeout=200 >> error: conf_param: Invalid argument >> >> Server mds01 is 100% MGS node. What is wrong here then? The only two >> reasons for that problem I can think of is that file system name >> contain "-" character. However I didn't find anything in documentation >> that would say that this character is not allowed to be used. Another >> reason is that MGS is combined with MDS? >> >> syslog contains following messages: >> >> Nov 7 18:38:35 mds01 kernel: LustreError: >> 3273:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. >> cfg_device from lctl is 'ddn-home' >> Nov 7 18:38:35 mds01 kernel: LustreError: >> 3273:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >> Nov 7 18:39:46 mds01 kernel: LustreError: >> 3274:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. >> cfg_device from lctl is 'ddn-data' >> Nov 7 18:39:46 mds01 kernel: LustreError: >> 3274:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >> Nov 7 18:39:54 mds01 kernel: LustreError: >> 3275:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. >> cfg_device from lctl is 'ddn-data' >> Nov 7 18:39:54 mds01 kernel: LustreError: >> 3275:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >> Nov 7 18:40:01 mds01 kernel: LustreError: >> 3282:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. >> cfg_device from lctl is 'ddn-data' >> Nov 7 18:40:01 mds01 kernel: LustreError: >> 3282:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >> Nov 7 18:41:06 mds01 kernel: LustreError: >> 3305:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. >> cfg_device from lctl is 'ddn-data' >> Nov 7 18:41:06 mds01 kernel: LustreError: >> 3305:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >> Nov 7 18:41:15 mds01 kernel: LustreError: >> 3306:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. >> cfg_device from lctl is 'ddn-home' >> Nov 7 18:41:15 mds01 kernel: LustreError: >> 3306:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >> >> From above it looks like only first part of file system name is >> recognized "ddn" and -home or -data is omitted. >> >> Please advise. >> >> Wojciech Turek >> > > You seem to have found a bug. I just tried this myself and it doesn't > work with a "-" in the name. Maybe use a '.' instead until we fix it. > Argh, sorry, that doesn't work with conf_param either. But an underscore '_' does. I'm filing a bug report...
_______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
