On 2010-11-10, at 14:40, Bob Ball wrote: > Yes, this brought us back up (sorry, took us a while). Clients see the > system, and I can read and write files. But...... > > What have we lost by doing this? Can we now let it go and recover as usual? > What is the next step here?
The abort_recovery option evicted all of the clients, so any of their in-progress operations would have failed. They have all since reconnected and no action is needed. > On 11/10/2010 3:00 PM, Andreas Dilger wrote: >> On 2010-11-10, at 11:01, Bob Ball wrote: >>> Well, we ran 2 days, migrating files off OST, then this morning, the MDT >>> crashed. Could not get all clients reconnected before seeing another >>> kernel panic on the mdt. did an e2fsck of the mdt db and tried again. >>> crashed again, but this time the logged message is: >>> >>> 2010-11-10T12:40:26-05:00 lmd01.local kernel: [12307.325340] Lustre: >>> 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids >>> 2010-11-10T12:40:27-05:00 lmd01.local kernel: [12308.347087] Lustre: >>> 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids >>> >>> I've seen this message elsewhere, but can't seem to find anything on it >>> now, or what to do about it. >> >> This might be a recovery-only problem. Try mounting the MDS with the mount >> option "-o abort_recovery". >> >>> On 11/8/2010 4:27 PM, Bob Ball wrote: >>>> Yes, you are correct. That was the key here, did not put that file back >>>> in place. Back up and (so far) operating cleanly. >>>> >>>> Thanks, >>>> bob >>>> >>>> On 11/8/2010 3:04 PM, Andreas Dilger wrote: >>>>> On 2010-11-08, at 11:39, Bob Ball wrote: >>>>>> Don't know if I sent to the whole list. One of those days. >>>>>> >>>>>> remade the raid device, remade the lustre fs on it, but the disks won't >>>>>> mount. Error is below. How do I overcome this? >>>>>> >>>>>> mounting device /dev/sdc at /mnt/ost12, flags=0 options=device=/dev/sdc >>>>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in >>>>>> use retries left: 0 >>>>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in use >>>>>> The target service's index is already in use. (/dev/sdc) >>>>> Looks like you didn't copy the old "CONFIGS/mountdata" file over the new >>>>> one. You can also use "--writeconf" (described in the manual and several >>>>> times on the list) to have the MGS re-generate the configuration, which >>>>> should fix this as well. >>>>> >>>>>> On 11/8/2010 5:01 AM, Andreas Dilger wrote: >>>>>>> On 2010-11-07, at 12:32, Bob Ball<[email protected]> wrote: >>>>>>>> Tomorrow, we will redo all 8 OST on the first file server we are >>>>>>>> redoing. I am very nervous about this, as a lot is riding on us doing >>>>>>>> this correctly. For example, on a client now, if I umount one of the >>>>>>>> ost, without first taking some (unknown to me) action on the MDT, then >>>>>>>> the client will hang on the "df" command. >>>>>>>> >>>>>>>> So, while we are doing the reformat, is there any way to avoid this >>>>>>>> "hang" situation? >>>>>>> If you issue "lctl --device %{OSC UUID} deactivate" on the MDS and >>>>>>> clients then any operations on those OSTs will immediately fail with an >>>>>>> IO error. If you are migrating I objects from those OSTs, I would have >>>>>>> imagined you already did this on the MDS or new objects would have >>>>>>> continued to be allocated there >>>>>>> >>>>>>>> Is the --index=XX argument to mkfs.lustre hex, or decimal? Seems from >>>>>>>> your comment below that this must be hex? >>>>>>> Decimal, though it may also accept hex (I can't check right now). >>>>>>> >>>>>>>> Finally, does supplying the --index even matter if we restore the >>>>>>>> files below that you mention? That seems to be what you are saying. >>>>>>> Well, you still need to set the filesystem label. This could be done >>>>>>> with tune2fs, but you may as well specify the right index from the >>>>>>> beginning. >>>>>>> >>>>>>>> On 11/6/2010 11:09 AM, Andreas Dilger wrote: >>>>>>>>> On 2010-11-06, at 8:24, Bob Ball<[email protected]> wrote: >>>>>>>>>> I am emptying a set of OST so that I can reformat the underlying >>>>>>>>>> RAID-6 >>>>>>>>>> more efficiently. Two questions: >>>>>>>>>> 1. Is there a quick way to tell if the OST is really empty? lfs_find >>>>>>>>>> takes many hours to run. >>>>>>>>> If you mount the OST as type ldiskfs and look in the O/0/d* >>>>>>>>> directories (capital-O, zero) there should be a few hundred >>>>>>>>> zero-length objects owned by root. >>>>>>>>> >>>>>>>>>> 2. When I reformat, I want it to retain the same ID so as to not make >>>>>>>>>> "holes" in the list. From the following information, am I correct to >>>>>>>>>> assume that the id is 24? If not, how do I determine the correct ID >>>>>>>>>> to >>>>>>>>>> use when we re-create the file system? >>>>>>>>> If you still have the existing OST, the easiest way to do this is to >>>>>>>>> save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them >>>>>>>>> into the reformatted OST. >>>>>>>>> >>>>>>>>>> /dev/sdj 3.5T 3.1T 222G 94% /mnt/ost51 >>>>>>>>>> 10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547 >>>>>>>>>> umt3-OST0018_UUID 3.4T 3.0T 221.1G 88% >>>>>>>>>> /lustre/umt3[OST:24] >>>>>>>>>> 20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5 >>>>>>>>> The OST index is indeed 24 (18 hex). As for /dev/sdj, it is hard to >>>>>>>>> know from the above info. If you run "e2label /dev/sdj" the >>>>>>>>> filesystem label should match the OST name umt3-OST0018. >>>>>>>>> >>>>>>>>> Cheers, Andreas >>>>>>>>> >>>>> Cheers, Andreas >>>>> -- >>>>> Andreas Dilger >>>>> Lustre Technical Lead >>>>> Oracle Corporation Canada Inc. >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> [email protected] >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Technical Lead >> Oracle Corporation Canada Inc. >> >> >> Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
