I didn't find the hack anywhere. I looked at what those files contained and decided to "hack and slash". Apparently, those files are generated from data within the filesystem system itself. A second running of writeconf displayed the target value to be "lustre1-OST0000", which is what I didn't want. :-(
Roger S. Wojciech Turek wrote: > Hi Roger > > Where did you find this CONFIG hack? > Did you make a copy of the CONFIG dir before followed this steps? > > > > On 15 July 2010 20:02, Roger Sersted <[email protected] > <mailto:[email protected]>> wrote: > > > I am using the ext4 RPMs. I ran the following commands on the MDS > and OSS nodes (lustre was not running at the time): > > > tune2fs -O extents,uninit_bg,dir_index /dev/XXX > fsck -pf /dev/XXX > > I then started Lustre "mount -t lustre /dev/XXX /lustre" on the > OSSes and then the MDS. The problem still persisted. I then > shutdown Lustre by unmounting the Lustre filesystems on the MDS/OSS > nodes. > > My last and most desperate step was to "hack" the CONFIG files. On > puppy7, I did the following: > > 1. mount -t ldiskfs /dev/sdc /mnt > 2. cd /mnt/CONFIG > 3. mv lustre1-OST0000 lustre1-OST0001 > 4. vim -nb lustre1-OST0001 mountdata > 5. I changed OST0000 to OST0001. > 6. I verified my changes by comparing an "od -c" of before > and after. > 7. umount /mnt > 8. tunefs.lustre -writeconf /dev/sdc > > The output of step 8 is: > > tunefs.lustre -writeconf /dev/sdc > > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: lustre1-OST0001 > > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x102 > (OST writeconf ) > > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > > > Permanent disk data: > Target: lustre1-OST0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x102 > (OST writeconf ) > > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > > Writing CONFIGS/mountdata > > Now part of the system seems to have the correct Target value. > > Thanks for your time on this. > > Roger S. > > Wojciech Turek wrote: > > Hi Roger, > > the Lustre 1.8.3 for RHEL5 has to set of RPMS one set for old > style ext3 based ldiskfs and one set for the ext4 based ldiskfs. > When upgrading from 1.6.6 to 1.8.3 I think you should not try to > use the ext4 based packages, can you let us know which RPMs have > you used? > > > > On 15 July 2010 16:14, Roger Sersted <[email protected] > <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>> wrote: > > > > Wojciech Turek wrote: > > can you also please post output of 'rpm -qa | grep > lustre' run > on puppy5-7 ? > > > > [r...@puppy5 log]# rpm -qa |grep -i lustre > kernel-2.6.18-164.11.1.el5_lustre.1.8.3 > lustre-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 > lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.3 > mft-2.6.0-2.6.18_164.11.1.el5_lustre.1.8.3 > lustre-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 > > [r...@puppy6 log]# rpm -qa | grep -i lustre > kernel-2.6.18-164.11.1.el5_lustre.1.8.3 > lustre-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 > lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.3 > mft-2.6.0-2.6.18_164.11.1.el5_lustre.1.8.3 > lustre-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 > > [r...@puppy7 CONFIGS]# rpm -qa | grep -i lustre > kernel-2.6.18-164.11.1.el5_lustre.1.8.3 > lustre-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 > lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.3 > mft-2.6.0-2.6.18_164.11.1.el5_lustre.1.8.3 > lustre-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 > > Thanks, > > Roger S. > > > On 15 July 2010 15:55, Roger Sersted <[email protected] > <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>>> wrote: > > > OK. This looks bad. It appears that I should have > upgraded > ext3 to > ext4, I found instructions for that, > > tune2fs -O extents,uninit_bg,dir_index /dev/XXX > fsck -pf /dev/XXX > Is the above correct? I'd like to move our > systems to ext4. I > didn't know those steps were necessary. > > Other answers listed below. > > > Wojciech Turek wrote: > > Hi Roger, > > Sorry for the delay. From the ldiskfs messages I > seem to > me that > you are using ext4 ldiskfs > (Jun 26 17:54:30 puppy7 kernel: ldiskfs created from > ext4-2.6-rhel5). > If you upgrading from 1.6.6 you ldiskfs is ext3 > based so > I think > taht in lustre-1.8.3 you should use ext3 based > ldiskfs rpm. > > Can you also tell us a bit more about your setup? > From > what you > wrote so far I understand you have 2 OSS servers > and each > server > has one OST device. In addition to that you have a > third > server > which acts as a MGS/MDS, is that right? > > The logs you provided seem to be only from one > server called > puppy7 so it does not give a whole picture of the > situation. The > timeout messages may indicate a problem with > communication > between the servers but it is really difficult to > say without > seeing the whole picture or at least more elements > of it. > > To check if you have correct rpms installed can you > please run > 'rpm -qa | grep lustre' on both OSS servers and > the MDS? > > Also please provide output from command 'lctl > list_nids' > run on > both OSS servers, MDS and a client? > > > puppy5 (MDS/MGS) > > 172.17....@o2ib > 172.16....@tcp > > puppy6 (OSS) > 172.17....@o2ib > 172.16....@tcp > > puppy7 (OSS) > 172.17....@o2ib > 172.16....@tcp > > > > > In addition to above please run following command > on all > lustre > targets (OSTs and MDT) to display your current lustre > configuration > > tunefs.lustre --dryrun --print /dev/<ost_device> > > > puppy5 (MDS/MGS) > Read previous values: > Target: lustre1-MDT0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x405 > (MDT MGS ) > Persistent mount opts: > errors=remount-ro,iopen_nopriv,user_xattr > Parameters: lov.stripesize=125K lov.stripecount=2 > mdt.group_upcall=/usr/sbin/l_getgroups > mdt.group_upcall=NONE > mdt.group_upcall=NONE > > > Permanent disk data: > Target: lustre1-MDT0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x405 > (MDT MGS ) > Persistent mount opts: > errors=remount-ro,iopen_nopriv,user_xattr > Parameters: lov.stripesize=125K lov.stripecount=2 > mdt.group_upcall=/usr/sbin/l_getgroups > mdt.group_upcall=NONE > mdt.group_upcall=NONE > > exiting before disk write. > ---------------------------------------------------- > puppy6 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: lustre1-OST0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > > > Permanent disk data: > Target: lustre1-OST0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > -------------------------------------------------- > puppy7 (this is the broken OSS. The "Target" should be > "lustre1-OST0001") > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: lustre1-OST0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > > > Permanent disk data: > Target: lustre1-OST0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > > exiting before disk write. > > > > If possible please attach syslog from each machine > from > the time > you mounted lustre targets (OST and MDT). > > Best regards, > > Wojciech > > On 14 July 2010 20:46, Roger Sersted > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > > <mailto:[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>>>> wrote: > > > Any additional info? > > Thanks, > > Roger S. > > > > > -- -- > Wojciech Turek > > > > > > -- -- > Wojciech Turek > > Assistant System Manager > 517 > > > > > -- _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
