Hi,

It is a lesson for me to do not change old habits. I always used "_" and for latest filesystem I did exception for the impression that it looks neater with "-" and here we go. Can I change file system name without reformatting everything? File system with bad name is in production and it is essential for me to fix it without long service downtime.

Thanks

Wojciech Turek

On 8 Nov 2007, at 19:04, Nathan Rutman wrote:

Nathan Rutman wrote:
Wojciech Turek wrote:

On 7 Nov 2007, at 22:31, Nathan Rutman wrote:


Cliff White wrote:

Wojciech Turek wrote:



Hi Cliff,

On 7 Nov 2007, at 17:58, Cliff White wrote:



Wojciech Turek wrote:



Hi,
Our lustre environment is:
2.6.9-55.0.9.EL_lustre.1.6.3smp
I would like to change recovery timeout from default value 250s to something longer
I tried example from manual:
set_timeout <secs> Sets the timeout (obd_timeout) for a server
to wait before failing recovery.
We performed that experiment on our test lustre installation with one OST.
storage02 is our OSS
[EMAIL PROTECTED] ~]# lctl dl
0 UP mgc [EMAIL PROTECTED] 31259d9b-e655-cdc4- c760-45d3df426d86 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7
[EMAIL PROTECTED] ~]# lctl --device 2 set_timeout 600
set_timeout has been deprecated. Use conf_param instead.
e.g. conf_param lustre-MDT0000 obd_timeout=50



sorry about this bad help message.  It's wrong.

usage: conf_param obd_timeout=<secs>
run <command> after connecting to device <devno>
--device <devno> <command [args ...]>
[EMAIL PROTECTED] ~]# lctl --device 1 conf_param obd_timeout=600
No device found for name MGS: Invalid argument
error: conf_param: No such device
It looks like I need to run this command from MGS node so I moved then to MGS server called storage03
[EMAIL PROTECTED] ~]# lctl dl
  0 UP mgs MGS MGS 9
1 UP mgc [EMAIL PROTECTED] f51a910b-a08e-4be6-5ada- b602a5ca9ab3 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov home-md-mdtlov home-md-mdtlov_UUID 4
  4 UP mds home-md-MDT0000 home-md-MDT0000_UUID 5
  5 UP osc home-md-OST0001-osc home-md-mdtlov_UUID 5
[EMAIL PROTECTED] ~]# lctl device 5
[EMAIL PROTECTED] ~]# lctl conf_param obd_timeout=600
error: conf_param: Function not implemented
[EMAIL PROTECTED] ~]# lctl --device 5 conf_param obd_timeout=600
error: conf_param: Function not implemented
[EMAIL PROTECTED] ~]# lctl help conf_param
conf_param: set a permanent config param. This command must be run on the MGS node
usage: conf_param <target.keyword=val> ...
[EMAIL PROTECTED] ~]# lctl conf_param home-md- MDT0000.obd_timeout=600
error: conf_param: Invalid argument
[EMAIL PROTECTED] ~]#
I searched whole /proc/*/lustre for file that can store this timeout value but nothing were found.
Could someone advise how to change value for recovery timeout?
Cheers,
Wojciech Turek



It looks like your file system is named 'home' - you can confirm with
tunefs.lustre --print <MDS device> | grep "Lustre FS"

The correct command (Run on the MGS) would be
# lctl conf_param home.sys.timeout=<val>

Example:
[EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/sdb |grep "Lustre FS"
Lustre FS:  lustre
[EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout
130
[EMAIL PROTECTED] ~]# lctl conf_param lustre.sys.timeout=150
[EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout
150



Thanks for your email. I am afraid your tips aren't very helpful in this case. As stated in the subject I am asking about recovery timeout. You can find it for example in /proc/fs/lustre/obdfilter/<OST>/ recovery_status whilst one of your OST's is in recovery state. By default this timeout is 250s. Whereas you are talking about system obd timeout (according to CFS documentation chapter 4.1.2 ) which is not a subject of my concern.

Any way I tried your example just to see if it works and again I am afraid it doesn't work for me, see below:
I have combined mgs and mds configuration.

[EMAIL PROTECTED] ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1             10317828   3452824   6340888  36% /
/dev/sda6              7605856     49788   7169708   1% /local
/dev/sda3              4127108     41000   3876460   2% /tmp
/dev/sda2              4127108    753668   3163792  20% /var
/dev/dm-2 1845747840 447502120 1398245720 25% /mnt/ sdb /dev/dm-1 6140723200 4632947344 1507775856 76% / mnt/sdc /dev/dm-3 286696376 1461588 268850900 1% /mnt/ home-md/mdt [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-3 |grep "Lustre FS"
Lustre FS:  home-md
Lustre FS:  home-md
[EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout
100
[EMAIL PROTECTED] ~]# lctl conf_param home-md.sys.timeout=150
error: conf_param: Invalid argument
[EMAIL PROTECTED] ~]#



You need to do this on the MGS node, with the MGS running.

mgs> lctl conf_param testfs.sys.timeout=150
anynode> cat /proc/sys/lustre/timeout

This isn't working for me. In my production configuration I have MGS combined with MDT on the same server. My lustre configuration consists of two file systems.
[EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-0
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     ddn-home-MDT0000
Index:      0
Lustre FS:  ddn-home
Mount type: ldiskfs
Flags:      0x5
              (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED]


   Permanent disk data:
Target:     ddn-home-MDT0000
Index:      0
Lustre FS:  ddn-home
Mount type: ldiskfs
Flags:      0x5
              (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED]

exiting before disk write.
[EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-1
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     ddn-data-MDT0000
Index:      0
Lustre FS:  ddn-data
Mount type: ldiskfs
Flags:      0x1
              (MDT )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED]


   Permanent disk data:
Target:     ddn-data-MDT0000
Index:      0
Lustre FS:  ddn-data
Mount type: ldiskfs
Flags:      0x1
              (MDT )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED]

exiting before disk write.
[EMAIL PROTECTED] ~]#
As you can see above MGS is on /dev/dm-0 combined with MDT for ddn-home file system.
If I try command line from your example I get this:
[EMAIL PROTECTED] ~]# lctl conf_param ddn-home.sys.timeout=200
error: conf_param: Invalid argument

Server mds01 is 100% MGS node. What is wrong here then? The only two reasons for that problem I can think of is that file system name contain "-" character. However I didn't find anything in documentation that would say that this character is not allowed to be used. Another reason is that MGS is combined with MDS?

syslog contains following messages:

Nov 7 18:38:35 mds01 kernel: LustreError: 3273:0:(mgs_llog.c: 1957:mgs_setparam()) No filesystem targets for ddn. cfg_device from lctl is 'ddn-home' Nov 7 18:38:35 mds01 kernel: LustreError: 3273:0:(mgs_handler.c: 605:mgs_iocontrol()) setparam err -22 Nov 7 18:39:46 mds01 kernel: LustreError: 3274:0:(mgs_llog.c: 1957:mgs_setparam()) No filesystem targets for ddn. cfg_device from lctl is 'ddn-data' Nov 7 18:39:46 mds01 kernel: LustreError: 3274:0:(mgs_handler.c: 605:mgs_iocontrol()) setparam err -22 Nov 7 18:39:54 mds01 kernel: LustreError: 3275:0:(mgs_llog.c: 1957:mgs_setparam()) No filesystem targets for ddn. cfg_device from lctl is 'ddn-data' Nov 7 18:39:54 mds01 kernel: LustreError: 3275:0:(mgs_handler.c: 605:mgs_iocontrol()) setparam err -22 Nov 7 18:40:01 mds01 kernel: LustreError: 3282:0:(mgs_llog.c: 1957:mgs_setparam()) No filesystem targets for ddn. cfg_device from lctl is 'ddn-data' Nov 7 18:40:01 mds01 kernel: LustreError: 3282:0:(mgs_handler.c: 605:mgs_iocontrol()) setparam err -22 Nov 7 18:41:06 mds01 kernel: LustreError: 3305:0:(mgs_llog.c: 1957:mgs_setparam()) No filesystem targets for ddn. cfg_device from lctl is 'ddn-data' Nov 7 18:41:06 mds01 kernel: LustreError: 3305:0:(mgs_handler.c: 605:mgs_iocontrol()) setparam err -22 Nov 7 18:41:15 mds01 kernel: LustreError: 3306:0:(mgs_llog.c: 1957:mgs_setparam()) No filesystem targets for ddn. cfg_device from lctl is 'ddn-home' Nov 7 18:41:15 mds01 kernel: LustreError: 3306:0:(mgs_handler.c: 605:mgs_iocontrol()) setparam err -22

From above it looks like only first part of file system name is recognized "ddn" and -home or -data is omitted.

Please advise.

Wojciech Turek


You seem to have found a bug. I just tried this myself and it doesn't work with a "-" in the name. Maybe use a '.' instead until we fix it.

Argh, sorry, that doesn't work with conf_param either. But an underscore '_' does. I'm filing a bug report...


Mr Wojciech Turek
Assistant System Manager
University of Cambridge
High Performance Computing service
email: [EMAIL PROTECTED]
tel. +441223763517



_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to