Well, our filesystem is back. I hexedit'ed the CONFIGS/p1-client and replaced prod_mds_001_UUID with p1-MDT0000_UUID and now our file system mounts.
Ran a heap of checks and it all looks good. Thanks everyone for your help. -- Dr Stuart Midgley [email protected] On 18/03/2012, at 3:36 PM, Stu Midgley wrote: > I'm well down this path... I replaced the mountdata with that from my > small temporary mdt (same name) and that didn't help. > > Now, I will do a few tests on the p1-client. Perhaps after a write > conf that is basically clean... and I can replace it... but currently > it contains lots of info about each of the OST's. > > All the OST's are happy mounting to the mdt and all think that they > are part of our p1 file system. > > Thanks. > > > On Sun, Mar 18, 2012 at 3:04 PM, Kit Westneat <[email protected]> wrote: >> Oh right, that makes sense. I guess if I were you I would try one of two >> things. First, back up the MDT, and then try: >> 1) format a small loopback device with the parameters you want the MDT to >> have, then replace the CONFIGS directory on your MDT with the CONFIGS >> directory on the loopback device >> - OR - >> 2) use a hex editor to modify the UUID >> >> Then use tunefs.lustre --print to make sure it all looks good before >> mounting it. >> >> Though one thing I wonder about is, are the OSTs on the same page with the >> fsname? Like are they expecting to be part of the p1 filesystem? >> >> HTH, >> Kit >> >> -- >> Kit Westneat >> System Administrator, eSys >> [email protected] >> 212-992-7647 >> >> >> On Sun, Mar 18, 2012 at 2:40 AM, Dr Stuart Midgley <[email protected]> wrote: >>> >>> No, we have tried that. >>> >>> This file system started life about 6 years ago as lustre 1.4 and has >>> continually been upgraded… hence the whacky UUID. Trying to rename the FS >>> doesn't work. It doesn't change the UUID that the mgs tells clients to >>> mount. >>> >>> >>> -- >>> Dr Stuart Midgley >>> [email protected] >>> >>> >>> >>> On 18/03/2012, at 2:24 PM, Kit Westneat wrote: >>> >>>> You should be able to reset the UUID by doing another writeconf with the >>>> --fsname flag. After the writeconf, you'll have to writeconf all the OSTs >>>> too. >>>> >>>> It worked on my very simple test at least: >>>> [root@mds1 tmp]# tunefs.lustre --writeconf --fsname=test1 /dev/loop0 >>>> checking for existing Lustre data: found CONFIGS/mountdata >>>> Reading CONFIGS/mountdata >>>> >>>> Read previous values: >>>> Target: t1-MDT0000 >>>> Index: 0 >>>> Lustre FS: t1 >>>> Mount type: ldiskfs >>>> Flags: 0x5 >>>> (MDT MGS ) >>>> Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro >>>> Parameters: mdt.group_upcall=/usr/sbin/l_getgroups >>>> >>>> >>>> Permanent disk data: >>>> Target: test1-MDT0000 >>>> Index: 0 >>>> Lustre FS: test1 >>>> Mount type: ldiskfs >>>> Flags: 0x105 >>>> (MDT MGS writeconf ) >>>> Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro >>>> Parameters: mdt.group_upcall=/usr/sbin/l_getgroups >>>> >>>> Writing CONFIGS/mountdata >>>> >>>> >>>> HTH, >>>> Kit >>>> -- >>>> Kit Westneat >>>> System Administrator, eSys >>>> [email protected] >>>> 212-992-7647 >>>> >>>> >>>> On Sun, Mar 18, 2012 at 1:20 AM, Stu Midgley <[email protected]> wrote: >>>> ok, from what I can tell, the root of the problem is >>>> >>>> >>>> [root@mds001 CONFIGS]# hexdump -C p1-MDT0000 | grep -C 2 mds >>>> 00002450 0b 00 00 00 04 00 00 00 12 00 00 00 00 00 00 00 >>>> |................| >>>> 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 >>>> |p1-MDT0000......| >>>> 00002470 6d 64 73 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 >>>> |mds.....prod_mds| >>>> 00002480 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 >>>> |_001_UUID.......| >>>> 00002490 78 00 00 00 07 00 00 00 88 00 00 00 08 00 00 00 >>>> |x...............| >>>> -- >>>> 000024c0 00 00 00 00 04 00 00 00 0b 00 00 00 12 00 00 00 >>>> |................| >>>> 000024d0 02 00 00 00 0b 00 00 00 70 31 2d 4d 44 54 30 30 >>>> |........p1-MDT00| >>>> 000024e0 30 30 00 00 00 00 00 00 70 72 6f 64 5f 6d 64 73 >>>> |00......prod_mds| >>>> 000024f0 5f 30 30 31 5f 55 55 49 44 00 00 00 00 00 00 00 >>>> |_001_UUID.......| >>>> 00002500 30 00 00 00 00 00 00 00 70 31 2d 4d 44 54 30 30 >>>> |0.......p1-MDT00| >>>> >>>> [root@mds001 CONFIGS]# >>>> [root@mds001 CONFIGS]# hexdump -C /mnt/md2/CONFIGS/p1-MDT0000 | grep -C >>>> 2 mds >>>> 00002450 0b 00 00 00 04 00 00 00 10 00 00 00 00 00 00 00 >>>> |................| >>>> 00002460 70 31 2d 4d 44 54 30 30 30 30 00 00 00 00 00 00 >>>> |p1-MDT0000......| >>>> 00002470 6d 64 73 00 00 00 00 00 70 31 2d 4d 44 54 30 30 >>>> |mds.....p1-MDT00| >>>> 00002480 30 30 5f 55 55 49 44 00 70 00 00 00 07 00 00 00 >>>> |00_UUID.p.......| >>>> 00002490 80 00 00 00 08 00 00 00 00 00 62 10 ff ff ff ff >>>> |..........b.....| >>>> >>>> >>>> now if only I can get the UUID to be removed or reset... >>>> >>>> >>>> On Sun, Mar 18, 2012 at 1:05 PM, Dr Stuart Midgley <[email protected]> >>>> wrote: >>>>> hmmm… that didn't work >>>>> >>>>> # tunefs.lustre --force --fsname=p1 /dev/md2 >>>>> checking for existing Lustre data: found CONFIGS/mountdata >>>>> Reading CONFIGS/mountdata >>>>> >>>>> Read previous values: >>>>> Target: p1-MDT0000 >>>>> Index: 0 >>>>> UUID: prod_mds_001_UUID >>>>> Lustre FS: p1 >>>>> Mount type: ldiskfs >>>>> Flags: 0x405 >>>>> (MDT MGS ) >>>>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >>>>> Parameters: >>>>> >>>>> tunefs.lustre: unrecognized option `--force' >>>>> tunefs.lustre: exiting with 22 (Invalid argument) >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Dr Stuart Midgley >>>>> [email protected] >>>>> >>>>> >>>>> >>>>> On 18/03/2012, at 12:17 AM, Nathan Rutman wrote: >>>>> >>>>>> Take them all down again, use tunefs.lustre --force --fsname. >>>>>> >>>>>> >>>>>> On Mar 17, 2012, at 2:10 AM, "Stu Midgley" <[email protected]> wrote: >>>>>> >>>>>>> Afternoon >>>>>>> >>>>>>> We have a rather severe problem with our lustre file system. We had >>>>>>> a >>>>>>> full config log and the advice was to rewrite it with a new one. >>>>>>> So, >>>>>>> we unmounted our lustre file system off all clients, unmount all the >>>>>>> ost's and then unmounted the mds. I then did >>>>>>> >>>>>>> mds: >>>>>>> tunefs.lustre --writeconf --erase-params /dev/md2 >>>>>>> >>>>>>> oss: >>>>>>> tunefs.lustre --writeconf --erase-params --mgsnode=mds001 /dev/md2 >>>>>>> >>>>>>> >>>>>>> >>>>>>> After the tunefs.lustre on the mds I saw >>>>>>> >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS MGS started >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: MGC172.16.0.251@tcp: >>>>>>> Reactivating import >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: MGS: Logs for fs p1 were >>>>>>> removed by user request. All servers must be restarted in order to >>>>>>> regenerate the logs. >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: Enabling user_xattr >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: new disk, >>>>>>> initializing >>>>>>> Mar 17 14:33:02 mds001 kernel: Lustre: p1-MDT0000: Now serving >>>>>>> p1-MDT0000 on /dev/md2 with recovery enabled >>>>>>> >>>>>>> which scared me a little... >>>>>>> >>>>>>> >>>>>>> >>>>>>> the mds and the oss's mount happily BUT I can't mount the file >>>>>>> system >>>>>>> on my clients... on the mds I see >>>>>>> >>>>>>> >>>>>>> Mar 17 16:42:11 mds001 kernel: LustreError: 137-5: UUID >>>>>>> 'prod_mds_001_UUID' is not available for connect (no target) >>>>>>> >>>>>>> >>>>>>> On the client I see >>>>>>> >>>>>>> >>>>>>> Mar 17 16:00:06 host kernel: LustreError: 11-0: an error occurred >>>>>>> while communicating with 172.16.0.251@tcp. The mds_connect operation >>>>>>> failed with -19 >>>>>>> >>>>>>> >>>>>>> now, it appears the writeconf renamed the UUID of the mds from >>>>>>> prod_mds_001_UUID to p1-MDT0000_UUID but I can't work out how to get >>>>>>> it back... >>>>>>> >>>>>>> >>>>>>> for example I tried >>>>>>> >>>>>>> >>>>>>> # tunefs.lustre --mgs --mdt --fsname=p1 /dev/md2 >>>>>>> checking for existing Lustre data: found CONFIGS/mountdata >>>>>>> Reading CONFIGS/mountdata >>>>>>> >>>>>>> Read previous values: >>>>>>> Target: p1-MDT0000 >>>>>>> Index: 0 >>>>>>> UUID: prod_mds_001_UUID >>>>>>> Lustre FS: p1 >>>>>>> Mount type: ldiskfs >>>>>>> Flags: 0x405 >>>>>>> (MDT MGS ) >>>>>>> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr >>>>>>> Parameters: >>>>>>> >>>>>>> tunefs.lustre: cannot change the name of a registered target >>>>>>> tunefs.lustre: exiting with 1 (Operation not permitted) >>>>>>> >>>>>>> >>>>>>> >>>>>>> I'm now stuck not being able to mount a 1PB file system... which >>>>>>> isn't good :( >>>>>>> >>>>>>> -- >>>>>>> Dr Stuart Midgley >>>>>>> [email protected] >>>>>> >>>>>> ______________________________________________________________________ >>>>>> This email may contain privileged or confidential information, which >>>>>> should only be used for the purpose for which it was sent by Xyratex. No >>>>>> further rights or licenses are granted to use such information. If you >>>>>> are >>>>>> not the intended recipient of this message, please notify the sender by >>>>>> return and delete it. You may not use, copy, disclose or rely on the >>>>>> information contained in it. >>>>>> >>>>>> Internet email is susceptible to data corruption, interception and >>>>>> unauthorised amendment for which Xyratex does not accept liability. >>>>>> While we >>>>>> have taken reasonable precautions to ensure that this email is free of >>>>>> viruses, Xyratex does not accept liability for the presence of any >>>>>> computer >>>>>> viruses in this email, nor for any losses caused as a result of viruses. >>>>>> >>>>>> Xyratex Technology Limited (03134912), Registered in England & Wales, >>>>>> Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. >>>>>> >>>>>> The Xyratex group of companies also includes, Xyratex Ltd, registered >>>>>> in Bermuda, Xyratex International Inc, registered in California, Xyratex >>>>>> (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co >>>>>> Ltd >>>>>> registered in The People's Republic of China and Xyratex Japan Limited >>>>>> registered in Japan. >>>>>> >>>>>> ______________________________________________________________________ >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Dr Stuart Midgley >>>> [email protected] >>>> >>> >> > > > > -- > Dr Stuart Midgley > [email protected] _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
