I'm hoping someone can help me out here.   We are running Lustre 1.8.4 
under SL5.7 (now, it has slowly upgraded over time from SL5.3 or SL5.4 
and it started out at Lustre 1.8.3).  A newly installed OSS running 
SL5.7 does not seem to show this issue, when making new OST (not reusing 
the index as in this case).  However, we were having underlying file 
system issues on one OST of this older server, so we drained that OST of 
all files using lfs_migrate, saved all the information such as LAST_ID, 
recreated the Virtual disk on the underlying Dell MD1000 shelf (PERC-6 
controller, RAID-5 on 9 750GB disks, 128kB stripe), and then, following 
a full init of the vdisk, tried to make the lustre file system:

[root@umfs06 reformat]# mkfs.lustre --ost --mgsnode= 
--fsname=umt3 --reformat --index=25 --mkfsoptions="-i 2000000" 
--mountfsoptions="errors=remount-ro,extents,mballoc,stripe=256" /dev/sdk

    Permanent disk data:
Target:     umt3-OST0019
Index:      25
Lustre FS:  umt3
Mount type: ldiskfs
Flags:      0x62
               (OST first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc,stripe=256
Parameters: mgsnode=

device size = 5719040MB
2 6 18
formatting backing filesystem ldiskfs on /dev/sdk
         target name  umt3-OST0019
         4k blocks     1464074240
         options       -i 2000000 -J size=400 -I 256 -q -O 
dir_index,extents,uninit_groups -F
mkfs_cmd = mke2fs -j -b 4096 -L umt3-OST0019 -i 2000000 -J size=400 -I 
256 -q -O dir_index,extents,uninit_groups -F /dev/sdk 1464074240
mkfs.lustre: Unable to mount /dev/sdk: Invalid argument

mkfs.lustre FATAL: failed to write local files
mkfs.lustre: exiting with 22 (Invalid argument)


/var/log/messages contains

2011-11-09T15:46:48-05:00 umfs06.aglt2.org kernel: [23601.867384] 
LDISKFS-fs (sdk): ldiskfs_check_descriptors: Inode bitmap for group 984 
not in group (block 28049409)!
2011-11-09T15:46:48-05:00 umfs06.aglt2.org kernel: [23601.867392] 
LDISKFS-fs (sdk): group descriptors corrupted!


This has happened multiple times now.  At various time, on various 
tries, the details of the group and block have changed.  But not this 
error.  Following a system reboot this morning, I was able to get this 
to complete, restored the LAST_ID, etc, but at mount time it failed, and 
corrupted the underlying volume so that e2fsck had to be run.  Wash, 
rinse, repeat.  So, as a last try, I did it all over from scratch, with 
the result above.

I'm at a loss to know what to do.  Before the volume was wiped and 
recreated I was able to mount it as "-t ldiskfs" without a problem, then 
remount it afterwards as "-t lustre".  rpm set is listed below.  The 
other 11 volumes on this OSS are served just fine.

Does anyone have any advice about what to try here?


[root@umfs06 reformat]# rpm -qa|grep lustre

[root@umfs06 reformat]# rpm -qa|grep e2fs

[root@umfs06 reformat]# uname -r

[root@umfs06 reformat]# cat /etc/redhat-release
Scientific Linux SL release 5.7 (Boron)

Lustre-discuss mailing list

Reply via email to