If this helps, the console shows this stuff at the kernel panic, leaving out most of the addresses and offsets for this "retyping"
bob :ptlrpc:ldlm_handle_enqueue :mds:mds_handle :lnet:lnet_match_blocked_msg :ptlrpc:lustre_msg_get_conn_cnt :ptlrpc:ptlrpc_server_handle_request __activate_task try_to_wake_up lock_timer_base __mod_timer :ptlrpc:ptlrpc_main default_wake_function audit_syscall_exit child_rip :ptlrpc:ptlrpc_main child_rip Code: 41 8b 14 d3 89 54 24 54 31 d2 29 c5 89 6c 24 58 0f 84 bf 00 RIP [<ffffffff88c644ef>] :ldiskfs:do_split RSP <ffff810422ae53b0> CR2: ffff810acc143e38 <0> Kernel panic - not syncing: Fatal exception On 11/10/2010 1:01 PM, Bob Ball wrote: > Well, we ran 2 days, migrating files off OST, then this morning, the MDT > crashed. Could not get all clients reconnected before seeing another > kernel panic on the mdt. did an e2fsck of the mdt db and tried again. > crashed again, but this time the logged message is: > > 2010-11-10T12:40:26-05:00 lmd01.local kernel: [12307.325340] Lustre: > 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids > 2010-11-10T12:40:27-05:00 lmd01.local kernel: [12308.347087] Lustre: > 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids > > I've seen this message elsewhere, but can't seem to find anything on it > now, or what to do about it. > > help? > > bob > > On 11/8/2010 4:27 PM, Bob Ball wrote: >> Yes, you are correct. That was the key here, did not put that file back >> in place. Back up and (so far) operating cleanly. >> >> Thanks, >> bob >> >> On 11/8/2010 3:04 PM, Andreas Dilger wrote: >>> On 2010-11-08, at 11:39, Bob Ball wrote: >>>> Don't know if I sent to the whole list. One of those days. >>>> >>>> remade the raid device, remade the lustre fs on it, but the disks won't >>>> mount. Error is below. How do I overcome this? >>>> >>>> mounting device /dev/sdc at /mnt/ost12, flags=0 options=device=/dev/sdc >>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in use >>>> retries left: 0 >>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in use >>>> The target service's index is already in use. (/dev/sdc) >>> Looks like you didn't copy the old "CONFIGS/mountdata" file over the new >>> one. You can also use "--writeconf" (described in the manual and several >>> times on the list) to have the MGS re-generate the configuration, which >>> should fix this as well. >>> >>>> On 11/8/2010 5:01 AM, Andreas Dilger wrote: >>>>> On 2010-11-07, at 12:32, Bob Ball<[email protected]> wrote: >>>>>> Tomorrow, we will redo all 8 OST on the first file server we are >>>>>> redoing. I am very nervous about this, as a lot is riding on us doing >>>>>> this correctly. For example, on a client now, if I umount one of the >>>>>> ost, without first taking some (unknown to me) action on the MDT, then >>>>>> the client will hang on the "df" command. >>>>>> >>>>>> So, while we are doing the reformat, is there any way to avoid this >>>>>> "hang" situation? >>>>> If you issue "lctl --device %{OSC UUID} deactivate" on the MDS and >>>>> clients then any operations on those OSTs will immediately fail with an >>>>> IO error. If you are migrating I objects from those OSTs, I would have >>>>> imagined you already did this on the MDS or new objects would have >>>>> continued to be allocated there >>>>> >>>>>> Is the --index=XX argument to mkfs.lustre hex, or decimal? Seems from >>>>>> your comment below that this must be hex? >>>>> Decimal, though it may also accept hex (I can't check right now). >>>>> >>>>>> Finally, does supplying the --index even matter if we restore the files >>>>>> below that you mention? That seems to be what you are saying. >>>>> Well, you still need to set the filesystem label. This could be done with >>>>> tune2fs, but you may as well specify the right index from the beginning. >>>>> >>>>>> On 11/6/2010 11:09 AM, Andreas Dilger wrote: >>>>>>> On 2010-11-06, at 8:24, Bob Ball<[email protected]> wrote: >>>>>>>> I am emptying a set of OST so that I can reformat the underlying RAID-6 >>>>>>>> more efficiently. Two questions: >>>>>>>> 1. Is there a quick way to tell if the OST is really empty? lfs_find >>>>>>>> takes many hours to run. >>>>>>> If you mount the OST as type ldiskfs and look in the O/0/d* directories >>>>>>> (capital-O, zero) there should be a few hundred zero-length objects >>>>>>> owned by root. >>>>>>> >>>>>>>> 2. When I reformat, I want it to retain the same ID so as to not make >>>>>>>> "holes" in the list. From the following information, am I correct to >>>>>>>> assume that the id is 24? If not, how do I determine the correct ID to >>>>>>>> use when we re-create the file system? >>>>>>> If you still have the existing OST, the easiest way to do this is to >>>>>>> save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them >>>>>>> into the reformatted OST. >>>>>>> >>>>>>>> /dev/sdj 3.5T 3.1T 222G 94% /mnt/ost51 >>>>>>>> 10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547 >>>>>>>> umt3-OST0018_UUID 3.4T 3.0T 221.1G 88% >>>>>>>> /lustre/umt3[OST:24] >>>>>>>> 20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5 >>>>>>> The OST index is indeed 24 (18 hex). As for /dev/sdj, it is hard to >>>>>>> know from the above info. If you run "e2label /dev/sdj" the filesystem >>>>>>> label should match the OST name umt3-OST0018. >>>>>>> >>>>>>> Cheers, Andreas >>>>>>> >>> Cheers, Andreas >>> -- >>> Andreas Dilger >>> Lustre Technical Lead >>> Oracle Corporation Canada Inc. >>> >>> >>> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
