On 7 Nov 2007, at 22:31, Nathan Rutman wrote:
Cliff White wrote:
Wojciech Turek wrote:
Hi Cliff,
On 7 Nov 2007, at 17:58, Cliff White wrote:
Wojciech Turek wrote:
Hi,
Our lustre environment is:
2.6.9-55.0.9.EL_lustre.1.6.3smp
I would like to change recovery timeout from default value
250s to something longer
I tried example from manual:
set_timeout <secs> Sets the timeout (obd_timeout) for a server
to wait before failing recovery.
We performed that experiment on our test lustre installation
with one OST.
storage02 is our OSS
[EMAIL PROTECTED] ~]# lctl dl
0 UP mgc [EMAIL PROTECTED] 31259d9b-e655-cdc4-
c760-45d3df426d86 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7
[EMAIL PROTECTED] ~]# lctl --device 2 set_timeout 600
set_timeout has been deprecated. Use conf_param instead.
e.g. conf_param lustre-MDT0000 obd_timeout=50
sorry about this bad help message. It's wrong.
usage: conf_param obd_timeout=<secs>
run <command> after connecting to device <devno>
--device <devno> <command [args ...]>
[EMAIL PROTECTED] ~]# lctl --device 1 conf_param obd_timeout=600
No device found for name MGS: Invalid argument
error: conf_param: No such device
It looks like I need to run this command from MGS node so I
moved then to MGS server called storage03
[EMAIL PROTECTED] ~]# lctl dl
0 UP mgs MGS MGS 9
1 UP mgc [EMAIL PROTECTED] f51a910b-a08e-4be6-5ada-
b602a5ca9ab3 5
2 UP mdt MDS MDS_uuid 3
3 UP lov home-md-mdtlov home-md-mdtlov_UUID 4
4 UP mds home-md-MDT0000 home-md-MDT0000_UUID 5
5 UP osc home-md-OST0001-osc home-md-mdtlov_UUID 5
[EMAIL PROTECTED] ~]# lctl device 5
[EMAIL PROTECTED] ~]# lctl conf_param obd_timeout=600
error: conf_param: Function not implemented
[EMAIL PROTECTED] ~]# lctl --device 5 conf_param obd_timeout=600
error: conf_param: Function not implemented
[EMAIL PROTECTED] ~]# lctl help conf_param
conf_param: set a permanent config param. This command must
be run on the MGS node
usage: conf_param <target.keyword=val> ...
[EMAIL PROTECTED] ~]# lctl conf_param home-md-
MDT0000.obd_timeout=600
error: conf_param: Invalid argument
[EMAIL PROTECTED] ~]#
I searched whole /proc/*/lustre for file that can store this
timeout value but nothing were found.
Could someone advise how to change value for recovery timeout?
Cheers,
Wojciech Turek
It looks like your file system is named 'home' - you can
confirm with
tunefs.lustre --print <MDS device> | grep "Lustre FS"
The correct command (Run on the MGS) would be
# lctl conf_param home.sys.timeout=<val>
Example:
[EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/sdb |grep "Lustre FS"
Lustre FS: lustre
[EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout
130
[EMAIL PROTECTED] ~]# lctl conf_param lustre.sys.timeout=150
[EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout
150
Thanks for your email. I am afraid your tips aren't very
helpful in this case. As stated in the subject I am asking
about recovery timeout.
You can find it for example in /proc/fs/lustre/obdfilter/<OST>/
recovery_status whilst one of your OST's is in recovery state.
By default this timeout is 250s.
Whereas you are talking about system obd timeout (according to
CFS documentation chapter 4.1.2 ) which is not a subject of my
concern.
Any way I tried your example just to see if it works and again
I am afraid it doesn't work for me, see below:
I have combined mgs and mds configuration.
[EMAIL PROTECTED] ~]# df
Filesystem 1K-blocks Used Available Use%
Mounted on
/dev/sda1 10317828 3452824 6340888 36% /
/dev/sda6 7605856 49788 7169708 1% /local
/dev/sda3 4127108 41000 3876460 2% /tmp
/dev/sda2 4127108 753668 3163792 20% /var
/dev/dm-2 1845747840 447502120 1398245720 25% /mnt/
sdb
/dev/dm-1 6140723200 4632947344 1507775856 76% /
mnt/sdc
/dev/dm-3 286696376 1461588 268850900 1% /mnt/
home-md/mdt
[EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-3 |grep
"Lustre FS"
Lustre FS: home-md
Lustre FS: home-md
[EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout
100
[EMAIL PROTECTED] ~]# lctl conf_param home-md.sys.timeout=150
error: conf_param: Invalid argument
[EMAIL PROTECTED] ~]#
You need to do this on the MGS node, with the MGS running.
mgs> lctl conf_param testfs.sys.timeout=150
anynode> cat /proc/sys/lustre/timeout
This isn't working for me. In my production configuration I have
MGS combined with MDT on the same server. My lustre configuration
consists of two file systems.
[EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-0
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: ddn-home-MDT0000
Index: 0
Lustre FS: ddn-home
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: [EMAIL PROTECTED]
[EMAIL PROTECTED]
Permanent disk data:
Target: ddn-home-MDT0000
Index: 0
Lustre FS: ddn-home
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: [EMAIL PROTECTED]
[EMAIL PROTECTED]
exiting before disk write.
[EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-1
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: ddn-data-MDT0000
Index: 0
Lustre FS: ddn-data
Mount type: ldiskfs
Flags: 0x1
(MDT )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: [EMAIL PROTECTED]
[EMAIL PROTECTED]
Permanent disk data:
Target: ddn-data-MDT0000
Index: 0
Lustre FS: ddn-data
Mount type: ldiskfs
Flags: 0x1
(MDT )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: [EMAIL PROTECTED]
[EMAIL PROTECTED]
exiting before disk write.
[EMAIL PROTECTED] ~]#
As you can see above MGS is on /dev/dm-0 combined with MDT for
ddn-home file system.
If I try command line from your example I get this:
[EMAIL PROTECTED] ~]# lctl conf_param ddn-home.sys.timeout=200
error: conf_param: Invalid argument
Server mds01 is 100% MGS node. What is wrong here then? The only
two reasons for that problem I can think of is that file system
name contain "-" character. However I didn't find anything in
documentation that would say that this character is not allowed
to be used. Another reason is that MGS is combined with MDS?
syslog contains following messages:
Nov 7 18:38:35 mds01 kernel: LustreError: 3273:0:(mgs_llog.c:
1957:mgs_setparam()) No filesystem targets for ddn. cfg_device
from lctl is 'ddn-home'
Nov 7 18:38:35 mds01 kernel: LustreError: 3273:0:(mgs_handler.c:
605:mgs_iocontrol()) setparam err -22
Nov 7 18:39:46 mds01 kernel: LustreError: 3274:0:(mgs_llog.c:
1957:mgs_setparam()) No filesystem targets for ddn. cfg_device
from lctl is 'ddn-data'
Nov 7 18:39:46 mds01 kernel: LustreError: 3274:0:(mgs_handler.c:
605:mgs_iocontrol()) setparam err -22
Nov 7 18:39:54 mds01 kernel: LustreError: 3275:0:(mgs_llog.c:
1957:mgs_setparam()) No filesystem targets for ddn. cfg_device
from lctl is 'ddn-data'
Nov 7 18:39:54 mds01 kernel: LustreError: 3275:0:(mgs_handler.c:
605:mgs_iocontrol()) setparam err -22
Nov 7 18:40:01 mds01 kernel: LustreError: 3282:0:(mgs_llog.c:
1957:mgs_setparam()) No filesystem targets for ddn. cfg_device
from lctl is 'ddn-data'
Nov 7 18:40:01 mds01 kernel: LustreError: 3282:0:(mgs_handler.c:
605:mgs_iocontrol()) setparam err -22
Nov 7 18:41:06 mds01 kernel: LustreError: 3305:0:(mgs_llog.c:
1957:mgs_setparam()) No filesystem targets for ddn. cfg_device
from lctl is 'ddn-data'
Nov 7 18:41:06 mds01 kernel: LustreError: 3305:0:(mgs_handler.c:
605:mgs_iocontrol()) setparam err -22
Nov 7 18:41:15 mds01 kernel: LustreError: 3306:0:(mgs_llog.c:
1957:mgs_setparam()) No filesystem targets for ddn. cfg_device
from lctl is 'ddn-home'
Nov 7 18:41:15 mds01 kernel: LustreError: 3306:0:(mgs_handler.c:
605:mgs_iocontrol()) setparam err -22
From above it looks like only first part of file system name is
recognized "ddn" and -home or -data is omitted.
Please advise.
Wojciech Turek