Hi Roger Where did you find this CONFIG hack? Did you make a copy of the CONFIG dir before followed this steps?
On 15 July 2010 20:02, Roger Sersted <[email protected]> wrote: > > I am using the ext4 RPMs. I ran the following commands on the MDS and OSS > nodes (lustre was not running at the time): > > > tune2fs -O extents,uninit_bg,dir_index /dev/XXX > fsck -pf /dev/XXX > > I then started Lustre "mount -t lustre /dev/XXX /lustre" on the OSSes and > then the MDS. The problem still persisted. I then shutdown Lustre by > unmounting the Lustre filesystems on the MDS/OSS nodes. > > My last and most desperate step was to "hack" the CONFIG files. On puppy7, > I did the following: > > 1. mount -t ldiskfs /dev/sdc /mnt > 2. cd /mnt/CONFIG > 3. mv lustre1-OST0000 lustre1-OST0001 > 4. vim -nb lustre1-OST0001 mountdata > 5. I changed OST0000 to OST0001. > 6. I verified my changes by comparing an "od -c" of before and > after. > 7. umount /mnt > 8. tunefs.lustre -writeconf /dev/sdc > > The output of step 8 is: > > tunefs.lustre -writeconf /dev/sdc > > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: lustre1-OST0001 > > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x102 > (OST writeconf ) > > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > > > Permanent disk data: > Target: lustre1-OST0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x102 > (OST writeconf ) > > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > > Writing CONFIGS/mountdata > > Now part of the system seems to have the correct Target value. > > Thanks for your time on this. > > Roger S. > > Wojciech Turek wrote: > >> Hi Roger, >> >> the Lustre 1.8.3 for RHEL5 has to set of RPMS one set for old style ext3 >> based ldiskfs and one set for the ext4 based ldiskfs. When upgrading from >> 1.6.6 to 1.8.3 I think you should not try to use the ext4 based packages, >> can you let us know which RPMs have you used? >> >> >> >> On 15 July 2010 16:14, Roger Sersted <[email protected] <mailto: >> [email protected]>> wrote: >> >> >> >> Wojciech Turek wrote: >> >> can you also please post output of 'rpm -qa | grep lustre' run >> on puppy5-7 ? >> >> >> >> [r...@puppy5 log]# rpm -qa |grep -i lustre >> kernel-2.6.18-164.11.1.el5_lustre.1.8.3 >> lustre-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 >> lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.3 >> mft-2.6.0-2.6.18_164.11.1.el5_lustre.1.8.3 >> lustre-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 >> >> [r...@puppy6 log]# rpm -qa | grep -i lustre >> kernel-2.6.18-164.11.1.el5_lustre.1.8.3 >> lustre-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 >> lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.3 >> mft-2.6.0-2.6.18_164.11.1.el5_lustre.1.8.3 >> lustre-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 >> >> [r...@puppy7 CONFIGS]# rpm -qa | grep -i lustre >> kernel-2.6.18-164.11.1.el5_lustre.1.8.3 >> lustre-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 >> lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.3 >> mft-2.6.0-2.6.18_164.11.1.el5_lustre.1.8.3 >> lustre-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3 >> >> Thanks, >> >> Roger S. >> >> >> On 15 July 2010 15:55, Roger Sersted <[email protected] >> <mailto:[email protected]> <mailto:[email protected] >> <mailto:[email protected]>>> wrote: >> >> >> OK. This looks bad. It appears that I should have upgraded >> ext3 to >> ext4, I found instructions for that, >> >> tune2fs -O extents,uninit_bg,dir_index /dev/XXX >> fsck -pf /dev/XXX >> Is the above correct? I'd like to move our >> systems to ext4. I >> didn't know those steps were necessary. >> >> Other answers listed below. >> >> >> Wojciech Turek wrote: >> >> Hi Roger, >> >> Sorry for the delay. From the ldiskfs messages I seem to >> me that >> you are using ext4 ldiskfs >> (Jun 26 17:54:30 puppy7 kernel: ldiskfs created from >> ext4-2.6-rhel5). >> If you upgrading from 1.6.6 you ldiskfs is ext3 based so >> I think >> taht in lustre-1.8.3 you should use ext3 based ldiskfs rpm. >> >> Can you also tell us a bit more about your setup? From >> what you >> wrote so far I understand you have 2 OSS servers and each >> server >> has one OST device. In addition to that you have a third >> server >> which acts as a MGS/MDS, is that right? >> >> The logs you provided seem to be only from one server called >> puppy7 so it does not give a whole picture of the >> situation. The >> timeout messages may indicate a problem with communication >> between the servers but it is really difficult to say >> without >> seeing the whole picture or at least more elements of it. >> >> To check if you have correct rpms installed can you >> please run >> 'rpm -qa | grep lustre' on both OSS servers and the MDS? >> >> Also please provide output from command 'lctl list_nids' >> run on >> both OSS servers, MDS and a client? >> >> >> puppy5 (MDS/MGS) >> >> 172.17....@o2ib >> 172.16....@tcp >> >> puppy6 (OSS) >> 172.17....@o2ib >> 172.16....@tcp >> >> puppy7 (OSS) >> 172.17....@o2ib >> 172.16....@tcp >> >> >> >> >> In addition to above please run following command on all >> lustre >> targets (OSTs and MDT) to display your current lustre >> configuration >> >> tunefs.lustre --dryrun --print /dev/<ost_device> >> >> >> puppy5 (MDS/MGS) >> Read previous values: >> Target: lustre1-MDT0000 >> Index: 0 >> Lustre FS: lustre1 >> Mount type: ldiskfs >> Flags: 0x405 >> (MDT MGS ) >> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >> Parameters: lov.stripesize=125K lov.stripecount=2 >> mdt.group_upcall=/usr/sbin/l_getgroups mdt.group_upcall=NONE >> mdt.group_upcall=NONE >> >> >> Permanent disk data: >> Target: lustre1-MDT0000 >> Index: 0 >> Lustre FS: lustre1 >> Mount type: ldiskfs >> Flags: 0x405 >> (MDT MGS ) >> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >> Parameters: lov.stripesize=125K lov.stripecount=2 >> mdt.group_upcall=/usr/sbin/l_getgroups mdt.group_upcall=NONE >> mdt.group_upcall=NONE >> >> exiting before disk write. >> ---------------------------------------------------- >> puppy6 >> checking for existing Lustre data: found CONFIGS/mountdata >> Reading CONFIGS/mountdata >> >> Read previous values: >> Target: lustre1-OST0000 >> Index: 0 >> Lustre FS: lustre1 >> Mount type: ldiskfs >> Flags: 0x2 >> (OST ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=172.17....@o2ib >> >> >> Permanent disk data: >> Target: lustre1-OST0000 >> Index: 0 >> Lustre FS: lustre1 >> Mount type: ldiskfs >> Flags: 0x2 >> (OST ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=172.17....@o2ib >> -------------------------------------------------- >> puppy7 (this is the broken OSS. The "Target" should be >> "lustre1-OST0001") >> checking for existing Lustre data: found CONFIGS/mountdata >> Reading CONFIGS/mountdata >> >> Read previous values: >> Target: lustre1-OST0000 >> Index: 0 >> Lustre FS: lustre1 >> Mount type: ldiskfs >> Flags: 0x2 >> (OST ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=172.17....@o2ib >> >> >> Permanent disk data: >> Target: lustre1-OST0000 >> Index: 0 >> Lustre FS: lustre1 >> Mount type: ldiskfs >> Flags: 0x2 >> (OST ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=172.17....@o2ib >> >> exiting before disk write. >> >> >> >> If possible please attach syslog from each machine from >> the time >> you mounted lustre targets (OST and MDT). >> >> Best regards, >> >> Wojciech >> >> On 14 July 2010 20:46, Roger Sersted <[email protected] >> <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> <mailto:[email protected] <mailto:[email protected]> >> >> <mailto:[email protected] <mailto:[email protected]>>>> wrote: >> >> >> Any additional info? >> >> Thanks, >> >> Roger S. >> >> >> >> >> -- -- >> Wojciech Turek >> >> >> >> >> >> -- -- >> Wojciech Turek >> >> Assistant System Manager >> 517 >> >> --
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
