can you also please post output of 'rpm -qa | grep lustre' run on puppy5-7 ?
On 15 July 2010 15:55, Roger Sersted <[email protected]> wrote: > > OK. This looks bad. It appears that I should have upgraded ext3 to ext4, > I found instructions for that, > > tune2fs -O extents,uninit_bg,dir_index /dev/XXX > fsck -pf /dev/XXX > > Is the above correct? I'd like to move our systems to ext4. I didn't know > those steps were necessary. > > Other answers listed below. > > > Wojciech Turek wrote: > >> Hi Roger, >> >> Sorry for the delay. From the ldiskfs messages I seem to me that you are >> using ext4 ldiskfs >> (Jun 26 17:54:30 puppy7 kernel: ldiskfs created from ext4-2.6-rhel5). >> If you upgrading from 1.6.6 you ldiskfs is ext3 based so I think taht in >> lustre-1.8.3 you should use ext3 based ldiskfs rpm. >> >> Can you also tell us a bit more about your setup? From what you wrote so >> far I understand you have 2 OSS servers and each server has one OST device. >> In addition to that you have a third server which acts as a MGS/MDS, is that >> right? >> >> The logs you provided seem to be only from one server called puppy7 so it >> does not give a whole picture of the situation. The timeout messages may >> indicate a problem with communication between the servers but it is really >> difficult to say without seeing the whole picture or at least more elements >> of it. >> >> To check if you have correct rpms installed can you please run 'rpm -qa | >> grep lustre' on both OSS servers and the MDS? >> >> Also please provide output from command 'lctl list_nids' run on both OSS >> servers, MDS and a client? >> > > puppy5 (MDS/MGS) > > 172.17....@o2ib > 172.16....@tcp > > puppy6 (OSS) > 172.17....@o2ib > 172.16....@tcp > > puppy7 (OSS) > 172.17....@o2ib > 172.16....@tcp > > > > >> In addition to above please run following command on all lustre targets >> (OSTs and MDT) to display your current lustre configuration >> >> tunefs.lustre --dryrun --print /dev/<ost_device> >> > > puppy5 (MDS/MGS) > Read previous values: > Target: lustre1-MDT0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x405 > (MDT MGS ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: lov.stripesize=125K lov.stripecount=2 > mdt.group_upcall=/usr/sbin/l_getgroups mdt.group_upcall=NONE > mdt.group_upcall=NONE > > > Permanent disk data: > Target: lustre1-MDT0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x405 > (MDT MGS ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: lov.stripesize=125K lov.stripecount=2 > mdt.group_upcall=/usr/sbin/l_getgroups mdt.group_upcall=NONE > mdt.group_upcall=NONE > > exiting before disk write. > ---------------------------------------------------- > puppy6 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: lustre1-OST0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > > > Permanent disk data: > Target: lustre1-OST0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > -------------------------------------------------- > puppy7 (this is the broken OSS. The "Target" should be "lustre1-OST0001") > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: lustre1-OST0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > > > Permanent disk data: > Target: lustre1-OST0000 > Index: 0 > Lustre FS: lustre1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.17....@o2ib > > exiting before disk write. > > > >> If possible please attach syslog from each machine from the time you >> mounted lustre targets (OST and MDT). >> >> Best regards, >> >> Wojciech >> >> On 14 July 2010 20:46, Roger Sersted <[email protected] <mailto: >> [email protected]>> wrote: >> >> >> Any additional info? >> >> Thanks, >> >> Roger S. >> >> >> >> >> -- >> -- >> Wojciech Turek >> >> >> -- -- Wojciech Turek Assistant System Manager High Performance Computing Service University of Cambridge Email: [email protected] Tel: (+)44 1223 763517
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
