Hello, I have a lustre system (still 1.6.3) that has an MDT which was too small and ran out of inodes. We removed files to take it back from the edge and then unmounted the Lustre disk from the clients and started to replace the MDT on the MGS.
I followed the instructions in Chapter 15 of the Lustre Manual under "Backup and Restore". I had no trouble unmounting the MDT remounting it as -t ldiskfs and running the getfattr and tar commands. I umounted the original disk and then mounted a new larger disk as -t ldiskfs and proceded to restore the data to the bigger disk via setfattr and tar expansion of the data I had just gotten hours before from the original MDT (Recall all clients have had this disk unmounted so no activity should have occurred to it.). When I mount the new larger disk as -t lustre as the MDT I see no mount errors, but the following errors appear in MGS /var/log/messages (no client access at this point): Mar 20 12:39:40 mds1 kernel: Lustre: MDT crew8-MDT0000 now serving dev (f8a0e9b5-c2f1-8297-4ead-e34c9680b3cf) with recovery enabled Mar 20 12:39:40 mds1 kernel: Lustre: Server crew8-MDT0000 on device /dev/METADATA2/LV2 has started Mar 20 12:39:40 mds1 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.64....@o2ib. The ost_connect operation failed with -114 Mar 20 12:39:40 mds1 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.64....@o2ib. The ost_connect operation failed with -114 Mar 20 12:39:40 mds1 kernel: LustreError: Skipped 6 previous similar messages Mar 20 12:39:40 mds1 kernel: LustreError: 3460:0:(llog_lvfs.c:597:llog_lvfs_create()) error looking up logfile 0xa65662:0x9c30d2f6: rc -2 Mar 20 12:39:40 mds1 kernel: LustreError: 3460:0:(osc_request.c:3446:osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT Mar 20 12:39:40 mds1 kernel: LustreError: 3460:0:(osc_request.c:3457:osc_llog_init()) osc 'crew8-OST0000-osc' tgt 'crew8-MDT0000' cnt 1 catid ffffc200050f8000 rc=-2 Mar 20 12:39:40 mds1 kernel: LustreError: 3460:0:(osc_request.c:3459:osc_llog_init()) logid 0xa65662:0x9c30d2f6 Mar 20 12:39:40 mds1 kernel: LustreError: 3460:0:(lov_log.c:214:lov_llog_init()) error osc_llog_init idx 0 osc 'crew8-OST0000-osc' tgt 'crew8-MDT0000' (rc=-2) Mar 20 12:39:40 mds1 kernel: LustreError: 3460:0:(mds_log.c:207:mds_llog_init()) lov_llog_init err -2 Mar 20 12:39:40 mds1 kernel: LustreError: 3460:0:(llog_obd.c:392:llog_cat_initialize()) rc: -2 Mar 20 12:40:05 mds1 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.64....@o2ib. The ost_connect operation failed with -114 Mar 20 12:40:05 mds1 kernel: LustreError: Skipped 4 previous similar messages The df shows the volume (crew8-MDT0000) mounted as is the other disk (crew2-MDT0000). /dev/METADATA1/LV1 204G 4.7G 190G 3% /srv/lustre/mds/crew2-MDT0000 /dev/METADATA2/LV2 204G 5.3G 187G 3% /srv/lustre/mds/crew8-MDT0000 The lctl dl shows all of the disks as being up: [r...@mds1 ~]# lctl lctl > dl 0 UP mgs MGS MGS 5 1 UP mgc mgc192.168.64....@o2ib b09fab05-c2ad-8ebb-553e-0e35f2fba17a 5 2 UP mdt MDS MDS_uuid 3 3 UP lov crew2-mdtlov crew2-mdtlov_UUID 4 4 UP mds crew2-MDT0000 crew2mds_UUID 9 5 UP osc crew2-OST0000-osc crew2-mdtlov_UUID 5 6 UP osc crew2-OST0001-osc crew2-mdtlov_UUID 5 7 UP osc crew2-OST0002-osc crew2-mdtlov_UUID 5 8 UP lov crew8-mdtlov crew8-mdtlov_UUID 4 9 UP mds crew8-MDT0000 crew8-MDT0000_UUID 15 10 UP osc crew8-OST0000-osc crew8-mdtlov_UUID 5 11 UP osc crew8-OST0001-osc crew8-mdtlov_UUID 5 12 UP osc crew8-OST0002-osc crew8-mdtlov_UUID 5 13 UP osc crew8-OST0003-osc crew8-mdtlov_UUID 5 14 UP osc crew8-OST0004-osc crew8-mdtlov_UUID 5 15 UP osc crew8-OST0005-osc crew8-mdtlov_UUID 5 16 UP osc crew8-OST0006-osc crew8-mdtlov_UUID 5 17 UP osc crew8-OST0007-osc crew8-mdtlov_UUID 5 18 UP osc crew8-OST0008-osc crew8-mdtlov_UUID 5 19 UP osc crew8-OST0009-osc crew8-mdtlov_UUID 5 20 UP osc crew8-OST000a-osc crew8-mdtlov_UUID 5 21 UP osc crew8-OST000b-osc crew8-mdtlov_UUID 5 Does this have anything to do with "Remove the recovery logs (now invalid), run 'rm OBJECTS/* CATALOGS'"? Should I just copy or rsync specific files from the smaller crew8-MDT0000 to the new, largre crew8-MDT0000? megan _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
