Andreas Dilger wrote: > On Mar 03, 2009 17:15 -0600, Nirmal Seenu wrote: >> mkfs.lustre --fsname=lqcdproj --ost --mgsnode=iblust...@tcp1 >> --mkfsoptions="-m 0" --index=0000 --reformat /dev/md2 >> >> I received these error messages when I tried to mount it for the first time: >> >> Mar 3 16:19:53 lustre1 kernel: Lustre: OST lqcdproj-OST0000 now serving >> dev (lqcdproj-OST0000/a968f0cc-a66b-bbf7-458f-9b8759c60ef5) with >> recovery enabled > > So, the new OST has started up after being reformatted. > >> Mar 3 16:19:56 lustre1 kernel: Lustre: MDS lqcdproj-MDT0000: >> lqcdproj-OST0000_UUID now active, resetting orphans > > Here, the MDS (which doesn't know that the OST was reformatted) > is trying to recreate the objects that are missing from the OST > (this might be several millions, because it doesn't know you > reformatted the filesystem). > >> Mar 3 16:19:58 lustre1 kernel: LustreError: >> 6359:0:(filter.c:3138:filter_precreate()) create failed rc = -28 > > Here, the OST has run out of inodes, because it was trying to > create some millions of objects. > > > This is probably a situation that Lustre could handle more gracefully, > by just refusing to recreate those missing objects if the count is too > high and accept the MDS's word for it that those objects were previously > used. It isn't ideal, since the number of times an OST is reformatted > like this is very small. > > Can you please file a bug at bugzilla.lustre.org with the detailed > procedure you followed. > > In the meantime I suggest you just format your new OST and add it > without specifying an OST index, and permanently mark OST0000 inactive > (steps to do so were recently discussed on the list). > This was the first thing that I tried. To permanently remove the OST0000 by using the command:
lctl --device 5 conf_param lqcdproj-OST0000.osc.active=0 The problem that I ran into at that point was that the execution of "lfs check servers" basically hosed the worker nodes and I had to reboot all the nodes. I am still running 1.6.6 version of patchless client on a RHEL4 machine with 2.6.21 kernel.org kernel on the worker nodes. Has this issue been fixed in 1.6.7? It will be at the least a couple of weeks before I could upgrade all the clients to 1.6.7. In the mean time I am going to try reformatting my OST to have more inodes and see if that fixes this problem. Thanks Nirmal _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
