I am having trouble replacing OST0000 with a new disk and would appreciate any help with fixing this problem.
All the data for OST0000 got moved out of this OST before decommissioning and the "lfs find" against this OST returned 0 files. I was able to bring up the new OST in the earlier version 1.6.6 and everything was working as expected in the version 1.6.6. I just update lustre version from 1.6.6 to 1.6.7 on the servers by using the patched kernel. At this point this OST is automatically marked as inactive in 1.6.7. Note: There was no quota enabled in the older version and I was trying to enable quota on the newer version. I tried the following without any success: mkfs.lustre --fsname=lqcdproj --ost --mgsnode=iblust...@tcp1 --mkfsoptions="-m 0" --index=0000 --reformat /dev/md2 tunefs.lustre --erase-params --ost --mgsnode=iblust...@tcp1 --param ost.quota_type=ug --writeconf /dev/md2 mount -t lustre /dev/md2 /mnt/ost0 I received these error messages when I tried to mount it for the first time: Mar 3 16:19:53 lustre1 kernel: Lustre: MGS: Regenerating lqcdproj-OST0000 log by user request. Mar 3 16:19:53 lustre1 kernel: Lustre: Skipped 1 previous similar message Mar 3 16:19:53 lustre1 kernel: Lustre: Setting parameter lqcdproj-OST0000.ost.quota_type in log lqcdproj-OST0000 Mar 3 16:19:53 lustre1 kernel: Lustre: Skipped 2 previous similar messages Mar 3 16:19:53 lustre1 kernel: Lustre: Filtering OBD driver; http://www.lustre.org/ Mar 3 16:19:53 lustre1 kernel: Lustre: lqcdproj-OST0000: new disk, initializing Mar 3 16:19:53 lustre1 kernel: Lustre: OST lqcdproj-OST0000 now serving dev (lqcdproj-OST0000/a968f0cc-a66b-bbf7-458f-9b8759c60ef5) with recovery enabled Mar 3 16:19:53 lustre1 kernel: Lustre: lqcdproj-OST0000.ost: set parameter quota_type=ug Mar 3 16:19:53 lustre1 kernel: Lustre: Server lqcdproj-OST0000 on device /dev/md2 has started Mar 3 16:19:56 lustre1 kernel: Lustre: lqcdproj-OST0000: received MDS connection from 0...@lo Mar 3 16:19:56 lustre1 kernel: Lustre: MDS lqcdproj-MDT0000: lqcdproj-OST0000_UUID now active, resetting orphans Mar 3 16:19:56 lustre1 kernel: Lustre: Skipped 2 previous similar messages Mar 3 16:19:58 lustre1 kernel: LustreError: 6359:0:(filter.c:3138:filter_precreate()) create failed rc = -28 Mar 3 16:19:58 lustre1 kernel: LustreError: 6631:0:(lov_obd.c:1048:lov_clear_orphans()) error in orphan recovery on OST idx 0/13: rc = -28 Mar 3 16:19:58 lustre1 kernel: LustreError: 6631:0:(mds_lov.c:951:__mds_lov_synchronize()) lqcdproj-OST0000_UUID failed at mds_lov_clear_orphans: -28 Mar 3 16:19:58 lustre1 kernel: LustreError: 6631:0:(mds_lov.c:960:__mds_lov_synchronize()) lqcdproj-OST0000_UUID sync failed -28, deactivating On the second attempt, I did a erase-params and writeconf on the MGS, MDT and all the OST partitions and still got the following error: Mar 3 16:23:44 lustre1 kernel: Lustre: MGS: Regenerating lqcdproj-OST0000 log by user request. Mar 3 16:23:44 lustre1 kernel: LustreError: 6909:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation CONFIGS/lqcdproj-OST0000T: -28 Mar 3 16:23:44 lustre1 kernel: LustreError: 6909:0:(mgc_request.c:1080:mgc_copy_llog()) Failed to copy remote log lqcdproj-OST0000 (-28) Mar 3 16:23:44 lustre1 kernel: Lustre: OST lqcdproj-OST0000 now serving dev (lqcdproj-OST0000/a968f0cc-a66b-bbf7-458f-9b8759c60ef5) with recovery enabled Mar 3 16:23:44 lustre1 kernel: Lustre: lqcdproj-OST0000.ost: set parameter quota_type=ug Mar 3 16:23:44 lustre1 kernel: Lustre: Skipped 1 previous similar message Mar 3 16:23:48 lustre1 kernel: Lustre: 6011:0:(quota_master.c:1642:mds_quota_recovery()) Not all osts are active, abort quota recovery Mar 3 16:23:48 lustre1 kernel: Lustre: lqcdproj-OST0000: received MDS connection from 0...@lo Mar 3 16:23:48 lustre1 kernel: Lustre: MDS lqcdproj-MDT0000: lqcdproj-OST0000_UUID now active, resetting orphans Mar 3 16:23:48 lustre1 kernel: LustreError: 6915:0:(filter.c:3138:filter_precreate()) create failed rc = -28 Mar 3 16:23:48 lustre1 kernel: LustreError: 7184:0:(lov_obd.c:1048:lov_clear_orphans()) error in orphan recovery on OST idx 0/2: rc = -28 Mar 3 16:23:48 lustre1 kernel: LustreError: 7184:0:(mds_lov.c:951:__mds_lov_synchronize()) lqcdproj-OST0000_UUID failed at mds_lov_clear_orphans: -28 Mar 3 16:23:48 lustre1 kernel: LustreError: 7184:0:(mds_lov.c:960:__mds_lov_synchronize()) lqcdproj-OST0000_UUID sync failed -28, deactivating Thanks Nirmal _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
