I'm hoping someone can help me out here. We are running Lustre 1.8.4 under SL5.7 (now, it has slowly upgraded over time from SL5.3 or SL5.4 and it started out at Lustre 1.8.3). A newly installed OSS running SL5.7 does not seem to show this issue, when making new OST (not reusing the index as in this case). However, we were having underlying file system issues on one OST of this older server, so we drained that OST of all files using lfs_migrate, saved all the information such as LAST_ID, recreated the Virtual disk on the underlying Dell MD1000 shelf (PERC-6 controller, RAID-5 on 9 750GB disks, 128kB stripe), and then, following a full init of the vdisk, tried to make the lustre file system:
[root@umfs06 reformat]# mkfs.lustre --ost --mgsnode=10.10.1.140@tcp0 --fsname=umt3 --reformat --index=25 --mkfsoptions="-i 2000000" --reformat --mountfsoptions="errors=remount-ro,extents,mballoc,stripe=256" /dev/sdk Permanent disk data: Target: umt3-OST0019 Index: 25 Lustre FS: umt3 Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc,stripe=256 Parameters: mgsnode=10.10.1.140@tcp device size = 5719040MB 2 6 18 formatting backing filesystem ldiskfs on /dev/sdk target name umt3-OST0019 4k blocks 1464074240 options -i 2000000 -J size=400 -I 256 -q -O dir_index,extents,uninit_groups -F mkfs_cmd = mke2fs -j -b 4096 -L umt3-OST0019 -i 2000000 -J size=400 -I 256 -q -O dir_index,extents,uninit_groups -F /dev/sdk 1464074240 mkfs.lustre: Unable to mount /dev/sdk: Invalid argument mkfs.lustre FATAL: failed to write local files mkfs.lustre: exiting with 22 (Invalid argument) ================== /var/log/messages contains 2011-11-09T15:46:48-05:00 umfs06.aglt2.org kernel: [23601.867384] LDISKFS-fs (sdk): ldiskfs_check_descriptors: Inode bitmap for group 984 not in group (block 28049409)! 2011-11-09T15:46:48-05:00 umfs06.aglt2.org kernel: [23601.867392] LDISKFS-fs (sdk): group descriptors corrupted! =================== This has happened multiple times now. At various time, on various tries, the details of the group and block have changed. But not this error. Following a system reboot this morning, I was able to get this to complete, restored the LAST_ID, etc, but at mount time it failed, and corrupted the underlying volume so that e2fsck had to be run. Wash, rinse, repeat. So, as a last try, I did it all over from scratch, with the result above. I'm at a loss to know what to do. Before the volume was wiped and recreated I was able to mount it as "-t ldiskfs" without a problem, then remount it afterwards as "-t lustre". rpm set is listed below. The other 11 volumes on this OSS are served just fine. Does anyone have any advice about what to try here? Thanks, bob [root@umfs06 reformat]# rpm -qa|grep lustre (none):lustre-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3.x86_64 (none):kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4.x86_64 (none):kernel-devel-2.6.18-164.11.1.el5_lustre.1.8.3.x86_64 (none):lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4.x86_64 0:kernel-module-openafs-2.6.18-194.3.1.el5_lustre.1.8.4-1.4.14-80.sl5.x86_64 (none):kernel-2.6.18-164.11.1.el5_lustre.1.8.3.x86_64 (none):lustre-tests-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4.x86_64 (none):kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4.x86_64 (none):lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.3.x86_64 (none):lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4.x86_64 (none):kernel-2.6.18-194.3.1.el5_lustre.1.8.4.x86_64 (none):lustre-ldiskfs-3.1.3-2.6.18_194.3.1.el5_lustre.1.8.4.x86_64 [root@umfs06 reformat]# rpm -qa|grep e2fs (none):e2fsprogs-devel-1.39-33.el5.x86_64 (none):e2fsprogs-1.41.10.sun2-0redhat.x86_64 (none):e2fsprogs-libs-1.39-33.el5.i386 [root@umfs06 reformat]# uname -r 2.6.18-194.3.1.el5_lustre.1.8.4 [root@umfs06 reformat]# cat /etc/redhat-release Scientific Linux SL release 5.7 (Boron) _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss