Dear Etienne, thanks a lot for the detailed explanation! I will try out the patch at the next opportunity.
@Tung-Han Hsieh: I think the issue that indices of old OST remain until --writeconf is used is solved by the new command lctl del_ost or the older lctl llog_cancel. Both are removing entries from the configuration log. But I would also be interested if someone could comment on wether or not it is a good idea to reuse old indices of removed OSTs. We did that meanwhile a few times, but as Thomas Roth pointed out, in the LAD22 talk about del_ost it was mentioned that old indices have in this special case not been reused. I think there was a mail on the mailing list a few month ago where someone asked if gaps in the OST indices are a problem. I haven't found this mail again, but I think that Andreas Dilger answered that gaps are not a problem but untested. Do I remember that correctly? Could someone comment on that question? Best regards, Robert > Am 26.10.2022 um 11:15 schrieb Etienne Aujames <[email protected]>: > > Hi, > > "mkfs.lustre --replace", is used to replace an existing OST in MGS > configurations (CONFIGS/*-{client,MDT*}). It will read the existing > configuration on the MGS for the given index, copy it locally. Then it > will negotiate LAST_IDs (last object id for each sequence) with MDTs > (the OST should update the last object ids with those registered on the > MDTs to avoid overlaps with existing objects). > > In your case, if you follow the procedure to permanently remove an OST > via llog_cancel or "lctl del_ost", you should not have any trace of the > old OST in your configuration (like it never existed). So you should > not use "mkfs.lustre --replace". > > With the LU-15000, the local copy of MDT configuration is not > (correctly) updated with the MGS one. This is because you canceled > indexes on the configuration and those canceled records were not copied > on the local one. > This mess up llog indexes between MGS and the local MDT copies. > > When you add an OST, the MDT configurations on MGS are updated (new > record added to declare new osp and new connections for the OST). > Then MDTs try to read only new indexes in the MGS configuration but the > last llog indexes between the two configurations are not the same > anymore: the MDT tries to read and apply older MGS's record. > > So you have to apply the patch on every server. > > Etienne > > On Wed, 2022-10-26 at 05:40 +0000, Redl, Robert wrote: >> Dear Etienne, >> >> thanks a lot! We do actually not have MDS crashes as described in LU- >> 15000, but we do of course have several index gaps caused by >> llog_cancel. >> >> Is it necessary to have this patch on all servers, or is only the MGS >> affected? >> >> About mkfs.lustre --replace: why is the --replace required if all >> traces of the old OST have been removed from the config log? Are >> indices that have been used before stored somewhere else? >> >> Best regards, >> Robert >> >>> Am 25.10.2022 um 14:15 schrieb Etienne Aujames < >>> [email protected] >>>> : >>> >>> Hello, >>> >>> I think you hit the following bug: >>> https://jira.whamcloud.com/browse/LU-15000 >>> MDS crashes with >>> (osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == >>> 1 ) >>> failed >>> >>> Stephane Thiell reported this issue and fixed it by patching his >>> 2.12.7 >>> version with >>> https://review.whamcloud.com/46552 >>> (2.15 backport: >>> https://review.whamcloud.com/47515 >>> ). >>> >>> A backport is issued for b2_15 branch but not yet landed: >>> https://review.whamcloud.com/c/fs/lustre-release/+/48898 >>> >>> >>> You could also check his LAD's presentation about removing OSTs >>> (lctl >>> del_ost): >>> "A filesystem coming of age: live hardware upgrade practices at >>> Stanford Research Computing" ( >>> https://www.eofs.eu/_media/events/lad22/2.5-stanfordrc_s_thiell.pdf >>> ) >>> >>> Etienne AUJAMES >>> >>> On Tue, 2022-10-25 at 10:12 +0000, Redl, Robert wrote: >>>> Dear Lustre Experts, >>>> >>>> some time ago we removed an OST. We followed the instructions >>>> from >>>> the documentation ( >>>> https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost >>>> >>>> ) including cleaning up the logs from all related entries using >>>> llog_cancel. After the removal the system worked normal. >>>> >>>> Now we are trying to add a new OST reusing the same index. If the >>>> OST >>>> is created with mkfs.lustre --replace, then it is possible to >>>> mount >>>> the OST, but it is not possible to mount the whole filesystem >>>> anymore. A client would see the following error message: >>>> >>>> kernel: LustreError: >>>> 70451:0:(obd_config.c:1499:class_process_config()) no device for: >>>> project-OST0007-osc-ffff914108c2e800 >>>> kernel: LustreError: >>>> 70451:0:(obd_config.c:2001:class_config_llog_handler()) >>>> MGC10.163.52.14@tcp: cfg command failed: rc = -22 >>>> kernel: Lustre: cmd=cf00b 0:project-OST0007-osc 1: >>>> 10.163.52.20@tcp >>>> kernel: LustreError: 1760:0:(mgc_request.c:612:do_requeue()) >>>> failed >>>> processing log: -22 >>>> >>>> In order to make the filesystem mountable again, all log entries >>>> created by mounting the OST must be removed using llog_cancel. >>>> >>>> If the OST is created using mkfs.lustre without --replace, then >>>> the >>>> OST itself is not mountable. The following error message is >>>> shown: >>>> >>>> kernel: LustreError: 140-5: Server project-OST0007 requested >>>> index 7, >>>> but that index is already in use. Use --writeconf to force >>>> kernel: LustreError: 7302:0:(mgs_handler.c:503:mgs_target_reg()) >>>> Failed to write project-OST0007 log (-98) >>>> >>>> Given that the --writeconf suggested in the error message >>>> requires a >>>> full shutdown of the system, we would like to avoid that. >>>> >>>> I wonder if we maybe overlooked something when the OST was >>>> removed. >>>> The logs for project-client, project-MDT0000, and project-MDT0001 >>>> are >>>> not showing any traces of the old OST anymore. Is there anything >>>> more >>>> that needs to be done to make lustre forget that an OST with a >>>> given >>>> index existed at some point? >>>> >>>> Lustre Version: 2.15.1, ZFS-backend. >>>> >>>> Thanks a lot! >>>> Robert >>>> >>>> _______________________________________________ >>>> lustre-discuss mailing list >>>> [email protected] >>>> >>>> >>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>>> >>>> >>>> >> >>
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
