Hi Cliff,
On 7 Nov 2007, at 17:58, Cliff White wrote:
Wojciech Turek wrote:
Hi,
Our lustre environment is:
2.6.9-55.0.9.EL_lustre.1.6.3smp
I would like to change recovery timeout from default value 250s to
something longer
I tried example from manual:
set_timeout <secs> Sets the timeout (obd_timeout) for a server
to wait before failing recovery.
We performed that experiment on our test lustre installation with
one OST.
storage02 is our OSS
[EMAIL PROTECTED] ~]# lctl dl
0 UP mgc [EMAIL PROTECTED] 31259d9b-e655-cdc4-c760-45d3df426d86 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7
[EMAIL PROTECTED] ~]# lctl --device 2 set_timeout 600
set_timeout has been deprecated. Use conf_param instead.
e.g. conf_param lustre-MDT0000 obd_timeout=50
usage: conf_param obd_timeout=<secs>
run <command> after connecting to device <devno>
--device <devno> <command [args ...]>
[EMAIL PROTECTED] ~]# lctl --device 1 conf_param obd_timeout=600
No device found for name MGS: Invalid argument
error: conf_param: No such device
It looks like I need to run this command from MGS node so I moved
then to MGS server called storage03
[EMAIL PROTECTED] ~]# lctl dl
0 UP mgs MGS MGS 9
1 UP mgc [EMAIL PROTECTED] f51a910b-a08e-4be6-5ada-b602a5ca9ab3 5
2 UP mdt MDS MDS_uuid 3
3 UP lov home-md-mdtlov home-md-mdtlov_UUID 4
4 UP mds home-md-MDT0000 home-md-MDT0000_UUID 5
5 UP osc home-md-OST0001-osc home-md-mdtlov_UUID 5
[EMAIL PROTECTED] ~]# lctl device 5
[EMAIL PROTECTED] ~]# lctl conf_param obd_timeout=600
error: conf_param: Function not implemented
[EMAIL PROTECTED] ~]# lctl --device 5 conf_param obd_timeout=600
error: conf_param: Function not implemented
[EMAIL PROTECTED] ~]# lctl help conf_param
conf_param: set a permanent config param. This command must be run
on the MGS node
usage: conf_param <target.keyword=val> ...
[EMAIL PROTECTED] ~]# lctl conf_param home-md-MDT0000.obd_timeout=600
error: conf_param: Invalid argument
[EMAIL PROTECTED] ~]#
I searched whole /proc/*/lustre for file that can store this
timeout value but nothing were found.
Could someone advise how to change value for recovery timeout?
Cheers,
Wojciech Turek
It looks like your file system is named 'home' - you can confirm with
tunefs.lustre --print <MDS device> | grep "Lustre FS"
The correct command (Run on the MGS) would be
# lctl conf_param home.sys.timeout=<val>
Example:
[EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/sdb |grep "Lustre FS"
Lustre FS: lustre
[EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout
130
[EMAIL PROTECTED] ~]# lctl conf_param lustre.sys.timeout=150
[EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout
150
Thanks for your email. I am afraid your tips aren't very helpful in
this case. As stated in the subject I am asking about recovery timeout.
You can find it for example in /proc/fs/lustre/obdfilter/<OST>/
recovery_status whilst one of your OST's is in recovery state. By
default this timeout is 250s.
Whereas you are talking about system obd timeout (according to CFS
documentation chapter 4.1.2 ) which is not a subject of my concern.
Any way I tried your example just to see if it works and again I am
afraid it doesn't work for me, see below:
I have combined mgs and mds configuration.
[EMAIL PROTECTED] ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 10317828 3452824 6340888 36% /
/dev/sda6 7605856 49788 7169708 1% /local
/dev/sda3 4127108 41000 3876460 2% /tmp
/dev/sda2 4127108 753668 3163792 20% /var
/dev/dm-2 1845747840 447502120 1398245720 25% /mnt/sdb
/dev/dm-1 6140723200 4632947344 1507775856 76% /mnt/sdc
/dev/dm-3 286696376 1461588 268850900 1% /mnt/home-md/mdt
[EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-3 |grep "Lustre FS"
Lustre FS: home-md
Lustre FS: home-md
[EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout
100
[EMAIL PROTECTED] ~]# lctl conf_param home-md.sys.timeout=150
error: conf_param: Invalid argument
[EMAIL PROTECTED] ~]#
Cheers,
Wojciech Turek
cliffw
---------------------------------------------------------------------
---
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Mr Wojciech Turek
Assistant System Manager
University of Cambridge
High Performance Computing service
email: [EMAIL PROTECTED]
tel. +441223763517
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss