Wojciech Turek wrote: > Hi, > > It is a lesson for me to do not change old habits. I always used "_" > and for latest filesystem I did exception for the impression that it > looks neater with "-" and here we go. > Can I change file system name without reformatting everything? File > system with bad name is in production and it is essential for me to > fix it without long service downtime.
Yes, but you will have to shut everything down. tunefs --writeconf all the servers, restart the MGS first. While you're at it, you can set the timeout. (This can be overridden later with conf_param). tunefs.lustre --writeconf --param="sys.timeout=50" /dev/sda > > Thanks > > Wojciech Turek > > On 8 Nov 2007, at 19:04, Nathan Rutman wrote: > >> Nathan Rutman wrote: >>> Wojciech Turek wrote: >>> >>> >>> >>>> On 7 Nov 2007, at 22:31, Nathan Rutman wrote: >>>> >>>> >>>> >>>>> Cliff White wrote: >>>>> >>>>> >>>>> >>>>>> Wojciech Turek wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Hi Cliff, >>>>>>> >>>>>>> On 7 Nov 2007, at 17:58, Cliff White wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Wojciech Turek wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> Our lustre environment is: >>>>>>>>> 2.6.9-55.0.9.EL_lustre.1.6.3smp >>>>>>>>> I would like to change recovery timeout from default value >>>>>>>>> 250s to something longer >>>>>>>>> I tried example from manual: >>>>>>>>> set_timeout <secs> Sets the timeout (obd_timeout) for a server >>>>>>>>> to wait before failing recovery. >>>>>>>>> We performed that experiment on our test lustre installation >>>>>>>>> with one OST. >>>>>>>>> storage02 is our OSS >>>>>>>>> [EMAIL PROTECTED] ~]# lctl dl >>>>>>>>> 0 UP mgc [EMAIL PROTECTED] >>>>>>>>> 31259d9b-e655-cdc4-c760-45d3df426d86 5 >>>>>>>>> 1 UP ost OSS OSS_uuid 3 >>>>>>>>> 2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7 >>>>>>>>> [EMAIL PROTECTED] ~]# lctl --device 2 set_timeout 600 >>>>>>>>> set_timeout has been deprecated. Use conf_param instead. >>>>>>>>> e.g. conf_param lustre-MDT0000 obd_timeout=50 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>> sorry about this bad help message. It's wrong. >>>>> >>>>> >>>>> >>>>>>>>> usage: conf_param obd_timeout=<secs> >>>>>>>>> run <command> after connecting to device <devno> >>>>>>>>> --device <devno> <command [args ...]> >>>>>>>>> [EMAIL PROTECTED] ~]# lctl --device 1 conf_param obd_timeout=600 >>>>>>>>> No device found for name MGS: Invalid argument >>>>>>>>> error: conf_param: No such device >>>>>>>>> It looks like I need to run this command from MGS node so I >>>>>>>>> moved then to MGS server called storage03 >>>>>>>>> [EMAIL PROTECTED] ~]# lctl dl >>>>>>>>> 0 UP mgs MGS MGS 9 >>>>>>>>> 1 UP mgc [EMAIL PROTECTED] >>>>>>>>> f51a910b-a08e-4be6-5ada-b602a5ca9ab3 5 >>>>>>>>> 2 UP mdt MDS MDS_uuid 3 >>>>>>>>> 3 UP lov home-md-mdtlov home-md-mdtlov_UUID 4 >>>>>>>>> 4 UP mds home-md-MDT0000 home-md-MDT0000_UUID 5 >>>>>>>>> 5 UP osc home-md-OST0001-osc home-md-mdtlov_UUID 5 >>>>>>>>> [EMAIL PROTECTED] ~]# lctl device 5 >>>>>>>>> [EMAIL PROTECTED] ~]# lctl conf_param obd_timeout=600 >>>>>>>>> error: conf_param: Function not implemented >>>>>>>>> [EMAIL PROTECTED] ~]# lctl --device 5 conf_param obd_timeout=600 >>>>>>>>> error: conf_param: Function not implemented >>>>>>>>> [EMAIL PROTECTED] ~]# lctl help conf_param >>>>>>>>> conf_param: set a permanent config param. This command must be >>>>>>>>> run on the MGS node >>>>>>>>> usage: conf_param <target.keyword=val> ... >>>>>>>>> [EMAIL PROTECTED] ~]# lctl conf_param >>>>>>>>> home-md-MDT0000.obd_timeout=600 >>>>>>>>> error: conf_param: Invalid argument >>>>>>>>> [EMAIL PROTECTED] ~]# >>>>>>>>> I searched whole /proc/*/lustre for file that can store this >>>>>>>>> timeout value but nothing were found. >>>>>>>>> Could someone advise how to change value for recovery timeout? >>>>>>>>> Cheers, >>>>>>>>> Wojciech Turek >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> It looks like your file system is named 'home' - you can >>>>>>>> confirm with >>>>>>>> tunefs.lustre --print <MDS device> | grep "Lustre FS" >>>>>>>> >>>>>>>> The correct command (Run on the MGS) would be >>>>>>>> # lctl conf_param home.sys.timeout=<val> >>>>>>>> >>>>>>>> Example: >>>>>>>> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/sdb |grep "Lustre FS" >>>>>>>> Lustre FS: lustre >>>>>>>> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >>>>>>>> 130 >>>>>>>> [EMAIL PROTECTED] ~]# lctl conf_param lustre.sys.timeout=150 >>>>>>>> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >>>>>>>> 150 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Thanks for your email. I am afraid your tips aren't very helpful >>>>>>> in this case. As stated in the subject I am asking about >>>>>>> recovery timeout. >>>>>>> You can find it for example in >>>>>>> /proc/fs/lustre/obdfilter/<OST>/recovery_status whilst one of >>>>>>> your OST's is in recovery state. By default this timeout is 250s. >>>>>>> Whereas you are talking about system obd timeout (according to >>>>>>> CFS documentation chapter 4.1.2 ) which is not a subject of my >>>>>>> concern. >>>>>>> >>>>>>> Any way I tried your example just to see if it works and again I >>>>>>> am afraid it doesn't work for me, see below: >>>>>>> I have combined mgs and mds configuration. >>>>>>> >>>>>>> [EMAIL PROTECTED] ~]# df >>>>>>> Filesystem 1K-blocks Used Available Use% Mounted on >>>>>>> /dev/sda1 10317828 3452824 6340888 36% / >>>>>>> /dev/sda6 7605856 49788 7169708 1% /local >>>>>>> /dev/sda3 4127108 41000 3876460 2% /tmp >>>>>>> /dev/sda2 4127108 753668 3163792 20% /var >>>>>>> /dev/dm-2 1845747840 447502120 1398245720 25% /mnt/sdb >>>>>>> /dev/dm-1 6140723200 4632947344 1507775856 76% /mnt/sdc >>>>>>> /dev/dm-3 286696376 1461588 268850900 1% >>>>>>> /mnt/home-md/mdt >>>>>>> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-3 |grep >>>>>>> "Lustre FS" >>>>>>> Lustre FS: home-md >>>>>>> Lustre FS: home-md >>>>>>> [EMAIL PROTECTED] ~]# cat /proc/sys/lustre/timeout >>>>>>> 100 >>>>>>> [EMAIL PROTECTED] ~]# lctl conf_param home-md.sys.timeout=150 >>>>>>> error: conf_param: Invalid argument >>>>>>> [EMAIL PROTECTED] ~]# >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> You need to do this on the MGS node, with the MGS running. >>>>> >>>>> mgs> lctl conf_param testfs.sys.timeout=150 >>>>> anynode> cat /proc/sys/lustre/timeout >>>>> >>>>> >>>>> >>>> This isn't working for me. In my production configuration I have >>>> MGS combined with MDT on the same server. My lustre configuration >>>> consists of two file systems. >>>> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-0 >>>> checking for existing Lustre data: found CONFIGS/mountdata >>>> Reading CONFIGS/mountdata >>>> >>>> Read previous values: >>>> Target: ddn-home-MDT0000 >>>> Index: 0 >>>> Lustre FS: ddn-home >>>> Mount type: ldiskfs >>>> Flags: 0x5 >>>> (MDT MGS ) >>>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >>>> Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] >>>> >>>> >>>> Permanent disk data: >>>> Target: ddn-home-MDT0000 >>>> Index: 0 >>>> Lustre FS: ddn-home >>>> Mount type: ldiskfs >>>> Flags: 0x5 >>>> (MDT MGS ) >>>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >>>> Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] >>>> >>>> exiting before disk write. >>>> [EMAIL PROTECTED] ~]# tunefs.lustre --print /dev/dm-1 >>>> checking for existing Lustre data: found CONFIGS/mountdata >>>> Reading CONFIGS/mountdata >>>> >>>> Read previous values: >>>> Target: ddn-data-MDT0000 >>>> Index: 0 >>>> Lustre FS: ddn-data >>>> Mount type: ldiskfs >>>> Flags: 0x1 >>>> (MDT ) >>>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >>>> Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] >>>> >>>> >>>> Permanent disk data: >>>> Target: ddn-data-MDT0000 >>>> Index: 0 >>>> Lustre FS: ddn-data >>>> Mount type: ldiskfs >>>> Flags: 0x1 >>>> (MDT ) >>>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >>>> Parameters: [EMAIL PROTECTED] [EMAIL PROTECTED] >>>> >>>> exiting before disk write. >>>> [EMAIL PROTECTED] ~]# >>>> As you can see above MGS is on /dev/dm-0 combined with MDT for >>>> ddn-home file system. >>>> If I try command line from your example I get this: >>>> [EMAIL PROTECTED] ~]# lctl conf_param ddn-home.sys.timeout=200 >>>> error: conf_param: Invalid argument >>>> >>>> Server mds01 is 100% MGS node. What is wrong here then? The only >>>> two reasons for that problem I can think of is that file system >>>> name contain "-" character. However I didn't find anything in >>>> documentation that would say that this character is not allowed to >>>> be used. Another reason is that MGS is combined with MDS? >>>> >>>> syslog contains following messages: >>>> >>>> Nov 7 18:38:35 mds01 kernel: LustreError: >>>> 3273:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for >>>> ddn. cfg_device from lctl is 'ddn-home' >>>> Nov 7 18:38:35 mds01 kernel: LustreError: >>>> 3273:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >>>> Nov 7 18:39:46 mds01 kernel: LustreError: >>>> 3274:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for >>>> ddn. cfg_device from lctl is 'ddn-data' >>>> Nov 7 18:39:46 mds01 kernel: LustreError: >>>> 3274:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >>>> Nov 7 18:39:54 mds01 kernel: LustreError: >>>> 3275:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for >>>> ddn. cfg_device from lctl is 'ddn-data' >>>> Nov 7 18:39:54 mds01 kernel: LustreError: >>>> 3275:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >>>> Nov 7 18:40:01 mds01 kernel: LustreError: >>>> 3282:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for >>>> ddn. cfg_device from lctl is 'ddn-data' >>>> Nov 7 18:40:01 mds01 kernel: LustreError: >>>> 3282:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >>>> Nov 7 18:41:06 mds01 kernel: LustreError: >>>> 3305:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for >>>> ddn. cfg_device from lctl is 'ddn-data' >>>> Nov 7 18:41:06 mds01 kernel: LustreError: >>>> 3305:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >>>> Nov 7 18:41:15 mds01 kernel: LustreError: >>>> 3306:0:(mgs_llog.c:1957:mgs_setparam()) No filesystem targets for >>>> ddn. cfg_device from lctl is 'ddn-home' >>>> Nov 7 18:41:15 mds01 kernel: LustreError: >>>> 3306:0:(mgs_handler.c:605:mgs_iocontrol()) setparam err -22 >>>> >>>> From above it looks like only first part of file system name is >>>> recognized "ddn" and -home or -data is omitted. >>>> >>>> Please advise. >>>> >>>> Wojciech Turek >>>> >>>> >>>> >>> >>> You seem to have found a bug. I just tried this myself and it >>> doesn't work with a "-" in the name. Maybe use a '.' instead until >>> we fix it. >>> >>> >>> >> Argh, sorry, that doesn't work with conf_param either. But an >> underscore '_' does. I'm filing a bug report... >> > > Mr Wojciech Turek > Assistant System Manager > University of Cambridge > High Performance Computing service > email: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > tel. +441223763517 > > > _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
