Wojciech Turek wrote: > > On 7 Nov 2007, at 22:31, Nathan Rutman wrote: > >> Cliff White wrote: >>> Wojciech Turek wrote: >>> >>> >>> >>>> Hi Cliff, >>>> >>>> On 7 Nov 2007, at 17:58, Cliff White wrote: >>>> >>>> >>>> >>>>> Wojciech Turek wrote: >>>>> >>>>> >>>>> >>>>>> Hi, >>>>>> Our lustre environment is: >>>>>> 2.6.9-55.0.9.EL_lustre.1.6.3smp >>>>>> I would like to change recovery timeout from default value 250s >>>>>> to something longer >>>>>> I tried example from manual: >>>>>> set_timeout <secs> Sets the timeout (obd_timeout) for a server >>>>>> to wait before failing recovery. >>>>>> We performed that experiment on our test lustre installation with >>>>>> one OST. >>>>>> storage02 is our OSS >>>>>> [EMAIL PROTECTED] ~]# lctl dl >>>>>> 0 UP mgc [EMAIL PROTECTED] 31259d9b-e655-cdc4-c760-45d3df426d86 5 >>>>>> 1 UP ost OSS OSS_uuid 3 >>>>>> 2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7 >>>>>> [EMAIL PROTECTED] ~]# lctl --device 2 set_timeout 600 >>>>>> set_timeout has been deprecated. Use conf_param instead. >>>>>> e.g. conf_param lustre-MDT0000 obd_timeout=50 >>>>>> >>>>>> >>>>>> >> sorry about this bad help message. It's wrong. >>>>>> usage: conf_param obd_timeout=<secs> >>>>>> run <command> after connecting to device <devno> >>>>>> --device <devno> <command [args ...]> >>>>>> [EMAIL PROTECTED] ~]# lctl --device 1 conf_param obd_timeout=600 >>>>>> No device found for name MGS: Invalid argument >>>>>> error: conf_param: No such device >>>>>> It looks like I need to run this command from MGS node so I >>>>>> moved then to MGS server called storage03 >>>>>> [EMAIL PROTECTED] ~]# lctl dl >>>>>> 0 UP mgs MGS MGS 9 >>>>>> 1 UP mgc [EMAIL PROTECTED] f51a910b-a08e-4be6-5ada-b602a5ca9ab3 5 >>>>>> 2 UP mdt MDS MDS_uuid 3 >>>>>> 3 UP lov home-md-mdtlov home-md-mdtlov_UUID 4 >>>>>> 4 UP mds home-md-MDT0000 home-md-MDT0000_UUID 5 >>>>>> 5 UP osc home-md-OST0001-osc home-md-mdtlov_UUID 5 >>>>>> [EMAIL PROTECTED] ~]# lctl device 5 >>>>>> [EMAIL PROTECTED] ~]# lctl conf_param obd_timeout=600 >>>>>> error: conf_param: Function not implemented >>>>>> [EMAIL PROTECTED] ~]# lctl --device 5 conf_param obd_timeout=600 >>>>>> error: conf_param: Function not implemented >>>>>> [EMAIL PROTECTED] ~]# lctl help conf_param >>>>>> conf_param: set a permanent config param. This command must be >>>>>> run on the MGS node >>>>>> usage: conf_param <target.keyword=val> ... >>>>>> [EMAIL PROTECTED] ~]# lctl conf_param home-md-MDT0000.obd_timeout=600 >>>>>> error: conf_param: Invalid argument >>>>>> [EMAIL PROTECTED] ~]# >>>>>> I searched whole /proc/*/lustre for file that can store this >>>>>> timeout value but nothing were found. >>>>>> Could someone advise how to change value for recovery timeout? >>>>>> Cheers, >>>>>> Wojciech Turek >>>>>> >>>>>> >>>>>> >>>>> It looks like your file system is named 'home' - you can confirm with >>>>> tunefs.lustre --print <MDS device> | grep "Lustre FS" >>>>> >>>>> The correct command (Run on the MGS) would be >>>>> # lctl conf_param home.sys.timeout=<val> >>>>> >>>>> Example: >>>>> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/sdb |grep "Lustre FS" >>>>> Lustre FS: lustre >>>>> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >>>>> 130 >>>>> [EMAIL PROTECTED] ~]# lctl conf_param lustre.sys.timeout=150 >>>>> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >>>>> 150 >>>>> >>>>> >>>>> >>>> Thanks for your email. I am afraid your tips aren't very helpful in >>>> this case. As stated in the subject I am asking about recovery timeout. >>>> You can find it for example in >>>> /proc/fs/lustre/obdfilter/<OST>/recovery_status whilst one of your >>>> OST's is in recovery state. By default this timeout is 250s. >>>> Whereas you are talking about system obd timeout (according to CFS >>>> documentation chapter 4.1.2 ) which is not a subject of my concern. >>>> >>>> Any way I tried your example just to see if it works and again I am >>>> afraid it doesn't work for me, see below: >>>> I have combined mgs and mds configuration. >>>> >>>> [EMAIL PROTECTED] ~]# df >>>> Filesystem 1K-blocks Used Available Use% Mounted on >>>> /dev/sda1 10317828 3452824 6340888 36% / >>>> /dev/sda6 7605856 49788 7169708 1% /local >>>> /dev/sda3 4127108 41000 3876460 2% /tmp >>>> /dev/sda2 4127108 753668 3163792 20% /var >>>> /dev/dm-2 1845747840 447502120 1398245720 25% /mnt/sdb >>>> /dev/dm-1 6140723200 4632947344 1507775856 76% /mnt/sdc >>>> /dev/dm-3 286696376 1461588 268850900 1% >>>> /mnt/home-md/mdt >>>> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-3 |grep "Lustre FS" >>>> Lustre FS: home-md >>>> Lustre FS: home-md >>>> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >>>> 100 >>>> [EMAIL PROTECTED] ~]# lctl conf_param home-md.sys.timeout=150 >>>> error: conf_param: Invalid argument >>>> [EMAIL PROTECTED] ~]# >>>> >>>> >>>> >> You need to do this on the MGS node, with the MGS running. >> >> mgs> lctl conf_param testfs.sys.timeout=150 >> anynode> cat /proc/sys/lustre/timeout > This isn't working for me. In my production configuration I have MGS > combined with MDT on the same server. My lustre configuration consists > of two file systems. > [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-0 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: ddn-home-MDT0000 > Index: 0 > Lustre FS: ddn-home > Mount type: ldiskfs > Flags: 0x5 > (MDT MGS ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] > > > Permanent disk data: > Target: ddn-home-MDT0000 > Index: 0 > Lustre FS: ddn-home > Mount type: ldiskfs > Flags: 0x5 > (MDT MGS ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] > > exiting before disk write. > [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-1 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: ddn-data-MDT0000 > Index: 0 > Lustre FS: ddn-data > Mount type: ldiskfs > Flags: 0x1 > (MDT ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] > > > Permanent disk data: > Target: ddn-data-MDT0000 > Index: 0 > Lustre FS: ddn-data > Mount type: ldiskfs > Flags: 0x1 > (MDT ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] > > exiting before disk write. > [EMAIL PROTECTED] ~]# > > As you can see above MGS is on /dev/dm-0 combined with MDT for > ddn-home file system. > If I try command line from your example I get this: > [EMAIL PROTECTED] ~]# lctl conf_param ddn-home.sys.timeout=200 > error: conf_param: Invalid argument > > Server mds01 is 100% MGS node. What is wrong here then? The only two > reasons for that problem I can think of is that file system name > contain "-" character. However I didn't find anything in documentation > that would say that this character is not allowed to be used. Another > reason is that MGS is combined with MDS? > > syslog contains following messages: > > Nov 7 18:38:35 mds01 kernel: LustreError: > 3273:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. > cfg_device from lctl is 'ddn-home' > Nov 7 18:38:35 mds01 kernel: LustreError: > 3273:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 > Nov 7 18:39:46 mds01 kernel: LustreError: > 3274:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. > cfg_device from lctl is 'ddn-data' > Nov 7 18:39:46 mds01 kernel: LustreError: > 3274:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 > Nov 7 18:39:54 mds01 kernel: LustreError: > 3275:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. > cfg_device from lctl is 'ddn-data' > Nov 7 18:39:54 mds01 kernel: LustreError: > 3275:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 > Nov 7 18:40:01 mds01 kernel: LustreError: > 3282:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. > cfg_device from lctl is 'ddn-data' > Nov 7 18:40:01 mds01 kernel: LustreError: > 3282:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 > Nov 7 18:41:06 mds01 kernel: LustreError: > 3305:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. > cfg_device from lctl is 'ddn-data' > Nov 7 18:41:06 mds01 kernel: LustreError: > 3305:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 > Nov 7 18:41:15 mds01 kernel: LustreError: > 3306:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for ddn. > cfg_device from lctl is 'ddn-home' > Nov 7 18:41:15 mds01 kernel: LustreError: > 3306:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 > > From above it looks like only first part of file system name is > recognized "ddn" and -home or -data is omitted. > > Please advise. > > Wojciech Turek
You seem to have found a bug. I just tried this myself and it doesn't work with a "-" in the name. Maybe use a '.' instead until we fix it. _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
