Dear Angelos, On Fri, Mar 05, 2021 at 12:15:19PM +0800, Angelos Ching via lustre-discuss wrote: > Hi TH, > > I think you'll have to set max_create_count=20000 after step 7 unless you > unmount and remount your MDT.
Yes. You are right. We have to set max_create_count=20000 for the replaced OST, otherwise it will not accept newly created files. > And for step 4, I used conf_param instead of set_param during my drill and I > noticed this might be more resilient if you are using a HA pair for the MDT > because the MDS might try to activate the inactive OST during failover as > set_param is only changing run time option? > > Regards, > Angelos I am concerning that, sometimes, the replacement of the OST many take a long time. In between we may encounter some other events that need to reboot the MDT servers. I am only sure that we can deactivate / reactivate the OST by conf_param when MDT server is not rebooted. Once MDT server is rebooted after setting conf_param=0 on the OST, I am not sure whether it can be recovered back or not. So probably I missed another step. Between step 6 and 7, we need to reactivate the old OST before mounting the new OST ? 6. Prepare the new OST for replacement by mkfs.lustre with --replace option, and set the index to the old OST index (e.g., 0x8): .... 6.5. Reactivate the old OST index: lctl set_param osc.chome-OST0008-osc-MDT0000.active=1 7. Mount the new OST (run in the new OST server). 8. Release the new OST for accepting new objects: lctl set_param osc.chome-OST0008-osc-MDT0000.max_create_count=20000 Cheers, T.H.Hsieh > On 05/03/2021 11:48, Tung-Han Hsieh via lustre-discuss wrote: > > Dear Hans, > > > > Thank you very much. Replacing the OST is new to me and very very > > useful. We will try it next time. > > > > So, according to the description of the manual, to replace the OST > > we probably need to: > > > > 1. Lock the old OST (e.g., chome-OST0008) such that it will not > > create new files (run in the MDT server): > > > > lctl set_param osc.chome-OST0008-osc-MDT0000.max_create_count=0 > > > > 2. Locate the list of files in the old OST: (e.g., chome-OST0008): > > (run in the client): > > > > lfs find --obd chome-OST0008_UUID /home > /tmp/OST0008.txt > > > > 3. Migrate the listed files in /tmp/OST0008.txt out of the old OST. > > (run in the client). > > > > 4. Remove the old OST temporarily (run in the MDT server): > > > > lctl set_param osc.chome-OST0008-osc-MDT0000.active=0 > > > > (Note: should use "set_param" instead of "conf_param") > > > > 5. Unmount the old OST partition (run in the old OST server) > > > > 6. Prepare the new OST for replacement by mkfs.lustre with --replace > > option, and set the index to the old OST index (e.g., 0x8): > > (run in the new OST server) > > > > mkfs.lustre --ost --mgsnode=XXXXXX --index=0x8 --replace <device_name> > > > > 7. Mount the new OST (run in the new OST server). > > > > > > Best Regards, > > > > T.H.Hsieh > > > > > > On Thu, Mar 04, 2021 at 04:59:54PM +0100, Hans Henrik Happe via > > lustre-discuss wrote: > > > Hi, > > > > > > The manual describe this: > > > > > > https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost > > > > > > There is a note telling you that it will still be there, but can be > > > replaced. > > > > > > Hope you migrated your data away from the OST also. Otherwise you would > > > have lost it. > > > > > > Cheers, > > > Hans Henrik > > > > > > On 03.03.2021 11.22, Tung-Han Hsieh via lustre-discuss wrote: > > > > Dear All, > > > > > > > > Here is a question about how to remove an OST completely without > > > > restarting the Lustre file system. Our Lustre version is 2.12.6. > > > > > > > > We did the following steps to remove the OST: > > > > > > > > 1. Lock the OST (e.g., chome-OST0008) such that it will not create > > > > new files (run in the MDT server): > > > > > > > > lctl set_param osc.chome-OST0008-osc-MDT0000.max_create_count=0 > > > > > > > > 2. Locate the list of files in the target OST: (e.g., chome-OST0008): > > > > (run in the client): > > > > > > > > lfs find --obd chome-OST0008_UUID /home > > > > > > > > 3. Remove OST (run in the MDT server): > > > > lctl conf_param osc.chome-OST0008-osc-MDT0000.active=0 > > > > > > > > 4. Unmount the OST partition (run in the OST server) > > > > > > > > After that, the total size of the Lustre file system decreased, and > > > > everything looks fine. However, without restarting (i.e., rebooting > > > > Lustre MDT / OST servers), we still feel that the removed OST is > > > > still exists. For example, in MDT: > > > > > > > > # lctl get_param osc.*.active > > > > osc.chome-OST0000-osc-MDT0000.active=1 > > > > osc.chome-OST0001-osc-MDT0000.active=1 > > > > osc.chome-OST0002-osc-MDT0000.active=1 > > > > osc.chome-OST0003-osc-MDT0000.active=1 > > > > osc.chome-OST0008-osc-MDT0000.active=0 > > > > osc.chome-OST0010-osc-MDT0000.active=1 > > > > osc.chome-OST0011-osc-MDT0000.active=1 > > > > osc.chome-OST0012-osc-MDT0000.active=1 > > > > osc.chome-OST0013-osc-MDT0000.active=1 > > > > osc.chome-OST0014-osc-MDT0000.active=1 > > > > > > > > We still see chome-OST0008. And in dmesg of MDT, we see a lot of: > > > > > > > > LustreError: 4313:0:(osp_object.c:594:osp_attr_get()) > > > > chome-OST0008-osc-MDT0000:osp_attr_get update error > > > > [0x100080000:0x10a54c:0x0]: rc = -108 > > > > > > > > In addition, when running LFSCK in the MDT server: > > > > > > > > lctl lfsck_start -A > > > > > > > > even after all the works of MDT and OST are completed, we still see that > > > > (run in MDT server): > > > > > > > > lctl get_param mdd.*.lfsck_layout > > > > > > > > the status is not completed: > > > > > > > > mdd.chome-MDT0000.lfsck_layout= > > > > name: lfsck_layout > > > > magic: 0xb1732fed > > > > version: 2 > > > > status: partial > > > > flags: incomplete > > > > param: all_targets > > > > last_completed_time: 1614762495 > > > > time_since_last_completed: 4325 seconds > > > > .... > > > > > > > > We suspect that the "incomplete" part might due to the already removed > > > > chome-OST0008. > > > > > > > > Is there any way to completely remove the chome-OST0008 from the Lustre > > > > file system ? since that OST device has already been reformatted for > > > > other usage. > > > > > > > > Thanks very much. > > > > > > > > > > > > T.H.Hsieh > > > > _______________________________________________ > > > > lustre-discuss mailing list > > > > [email protected] > > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > _______________________________________________ > > > lustre-discuss mailing list > > > [email protected] > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > _______________________________________________ > > lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
