Cliff White wrote: > Wojciech Turek wrote: > >> Hi Cliff, >> >> On 7 Nov 2007, at 17:58, Cliff White wrote: >> >> >>> Wojciech Turek wrote: >>> >>>> Hi, >>>> Our lustre environment is: >>>> 2.6.9-55.0.9.EL_lustre.1.6.3smp >>>> I would like to change recovery timeout from default value 250s to >>>> something longer >>>> I tried example from manual: >>>> set_timeout <secs> Sets the timeout (obd_timeout) for a server >>>> to wait before failing recovery. >>>> We performed that experiment on our test lustre installation with one >>>> OST. >>>> storage02 is our OSS >>>> [EMAIL PROTECTED] ~]# lctl dl >>>> 0 UP mgc [EMAIL PROTECTED] 31259d9b-e655-cdc4-c760-45d3df426d86 5 >>>> 1 UP ost OSS OSS_uuid 3 >>>> 2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7 >>>> [EMAIL PROTECTED] ~]# lctl --device 2 set_timeout 600 >>>> set_timeout has been deprecated. Use conf_param instead. >>>> e.g. conf_param lustre-MDT0000 obd_timeout=50 >>>> sorry about this bad help message. It's wrong. >>>> usage: conf_param obd_timeout=<secs> >>>> run <command> after connecting to device <devno> >>>> --device <devno> <command [args ...]> >>>> [EMAIL PROTECTED] ~]# lctl --device 1 conf_param obd_timeout=600 >>>> No device found for name MGS: Invalid argument >>>> error: conf_param: No such device >>>> It looks like I need to run this command from MGS node so I moved >>>> then to MGS server called storage03 >>>> [EMAIL PROTECTED] ~]# lctl dl >>>> 0 UP mgs MGS MGS 9 >>>> 1 UP mgc [EMAIL PROTECTED] f51a910b-a08e-4be6-5ada-b602a5ca9ab3 5 >>>> 2 UP mdt MDS MDS_uuid 3 >>>> 3 UP lov home-md-mdtlov home-md-mdtlov_UUID 4 >>>> 4 UP mds home-md-MDT0000 home-md-MDT0000_UUID 5 >>>> 5 UP osc home-md-OST0001-osc home-md-mdtlov_UUID 5 >>>> [EMAIL PROTECTED] ~]# lctl device 5 >>>> [EMAIL PROTECTED] ~]# lctl conf_param obd_timeout=600 >>>> error: conf_param: Function not implemented >>>> [EMAIL PROTECTED] ~]# lctl --device 5 conf_param obd_timeout=600 >>>> error: conf_param: Function not implemented >>>> [EMAIL PROTECTED] ~]# lctl help conf_param >>>> conf_param: set a permanent config param. This command must be run on >>>> the MGS node >>>> usage: conf_param <target.keyword=val> ... >>>> [EMAIL PROTECTED] ~]# lctl conf_param home-md-MDT0000.obd_timeout=600 >>>> error: conf_param: Invalid argument >>>> [EMAIL PROTECTED] ~]# >>>> I searched whole /proc/*/lustre for file that can store this timeout >>>> value but nothing were found. >>>> Could someone advise how to change value for recovery timeout? >>>> Cheers, >>>> Wojciech Turek >>>> >>> It looks like your file system is named 'home' - you can confirm with >>> tunefs.lustre --print <MDS device> | grep "Lustre FS" >>> >>> The correct command (Run on the MGS) would be >>> # lctl conf_param home.sys.timeout=<val> >>> >>> Example: >>> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/sdb |grep "Lustre FS" >>> Lustre FS: lustre >>> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >>> 130 >>> [EMAIL PROTECTED] ~]# lctl conf_param lustre.sys.timeout=150 >>> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >>> 150 >>> >> Thanks for your email. I am afraid your tips aren't very helpful in this >> case. As stated in the subject I am asking about recovery timeout. >> You can find it for example in >> /proc/fs/lustre/obdfilter/<OST>/recovery_status whilst one of your OST's >> is in recovery state. By default this timeout is 250s. >> Whereas you are talking about system obd timeout (according to CFS >> documentation chapter 4.1.2 ) which is not a subject of my concern. >> >> Any way I tried your example just to see if it works and again I am >> afraid it doesn't work for me, see below: >> I have combined mgs and mds configuration. >> >> [EMAIL PROTECTED] ~]# df >> Filesystem 1K-blocks Used Available Use% Mounted on >> /dev/sda1 10317828 3452824 6340888 36% / >> /dev/sda6 7605856 49788 7169708 1% /local >> /dev/sda3 4127108 41000 3876460 2% /tmp >> /dev/sda2 4127108 753668 3163792 20% /var >> /dev/dm-2 1845747840 447502120 1398245720 25% /mnt/sdb >> /dev/dm-1 6140723200 4632947344 1507775856 76% /mnt/sdc >> /dev/dm-3 286696376 1461588 268850900 1% /mnt/home-md/mdt >> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-3 |grep "Lustre FS" >> Lustre FS: home-md >> Lustre FS: home-md >> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >> 100 >> [EMAIL PROTECTED] ~]# lctl conf_param home-md.sys.timeout=150 >> error: conf_param: Invalid argument >> [EMAIL PROTECTED] ~]# >> You need to do this on the MGS node, with the MGS running.
mgs> lctl conf_param testfs.sys.timeout=150 anynode> cat /proc/sys/lustre/timeout > Hmm, not sure why that isn't working for you, I tested the example I > gave. Sorry about the mis-read. The obd recovery timeout is defined in > relation to obd_timeout, and afaik not changeable at runtime: > > lustre/include/lustre_lib.h > #define OBD_RECOVERY_TIMEOUT (obd_timeout * 5 / 2) > ...which gives the default 250 seconds for the default obd_timeout (100 > seconds) > > cliffw > > That's correct. These are tied together before lustre 1.6.4. >> Cheers, >> >> Wojciech Turek >> >> >> >> >>> cliffw >>> >>> >>>> ------------------------------------------------------------------------ >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> [email protected] >>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>> >> Mr Wojciech Turek >> Assistant System Manager >> University of Cambridge >> High Performance Computing service >> email: [EMAIL PROTECTED] >> tel. +441223763517 >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
