Little late to the party here, but I just ran into this myself. I worked around it without having to regenerate everything with --writeconf, which I realize isn't helpful 4 months after the fact, but I figured I'd post here to help anyone else who runs into this issue in the future.
In my case I had removed all the llog entries for the OSTs except the conf_param entries setting osc.active=0, assuming for whatever reason those should be retained. This is incorrect, you'll want to remove those too for each relevant OST. I've opened an issue in LUDOC with some suggestions about how phrasing might be improved. On Tue, 2023-07-18 at 23:55 +0000, Andreas Dilger via lustre-discuss wrote: > Brian, > Please file a ticket in LUDOC with details of how the manual should be > updated. Ideally, including a patch. :-) > > Cheers, Andreas > > > On Jul 11, 2023, at 15:39, Brad Merchant <[email protected]> > > wrote: > > > > > > We recreated the issue in a test cluster and it was definitely the > > llog_cancel steps that caused the issue. Clients couldn't process the llog > > properly on new mounts and would fail. We had to completely clear the > > llog and --writeconf every target to regenerate it from scratch. > > > > The cluster is up and running now but I would certainly recommend at least > > revising that section of the manual. > > > > > > > > On Mon, Jul 10, 2023 at 5:22 PM Brad Merchant > > <[email protected]> wrote: > > > We deactivated half of 32 OSTs after draining them. We followed the steps > > > in section 14.9.3 of the lustre manual > > > > > > https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost > > > > > > After running the steps in subhead "3. Deactivate the OST." on OST0010- > > > OST001f, new client mounts fail with the below log messages. Existing > > > client mounts seem to function correctly but are on a bit of a ticking > > > timebomb because they are configured with autofs. > > > > > > The llog_cancel steps are new to me and the issues seemed to appear after > > > those commands were issued (can't say that 100% definitively however). > > > Servers are running 2.12.5 and clients are on 2.14.x > > > > > > > > > Jul 10 15:22:40 adm-sup1 kernel: LustreError: > > > 26814:0:(obd_config.c:1514:class_process_config()) no device for: hydra- > > > OST0010-osc-ffff8be5340c2000 > > > Jul 10 15:22:40 adm-sup1 kernel: LustreError: > > > 26814:0:(obd_config.c:2038:class_config_llog_handler()) > > > MGC172.16.100.101@o2ib: cfg command failed: rc = -22 > > > Jul 10 15:22:40 adm-sup1 kernel: Lustre: cmd=cf00f 0:hydra-OST0010-osc > > > 1:osc.active=0 > > > Jul 10 15:22:40 adm-sup1 kernel: LustreError: 15b-f: > > > MGC172.16.100.101@o2ib: Configuration from log hydra-client failed from > > > MGS -22. Check client and MGS are on compatible version. > > > Jul 10 15:22:40 adm-sup1 kernel: Lustre: hydra: root_squash is set to > > > 99:99 > > > Jul 10 15:22:40 adm-sup1 systemd-udevd[26823]: Process '/usr/sbin/lctl > > > set_param 'llite.hydra-ffff8be5340c2000.nosquash_nids=192.168.80.84@tcp > > > 192.168.80.122@tcp 192.168.80.21@tcp 172.16.90.11@o2ib 172.16.100.211@o2ib > > > 172.16.100.212@o2ib 172.16.100.213@o2ib 172.16.100.214@o2ib > > > 172.16.100.215@o2ib 172.16.90.51@o2ib'' failed with exit code 2. > > > Jul 10 15:22:40 adm-sup1 kernel: Lustre: Unmounted hydra-client > > > Jul 10 15:22:40 adm-sup1 kernel: LustreError: > > > 26803:0:(obd_mount.c:1680:lustre_fill_super()) Unable to mount (-22) > > > > > > > > > > > > > > _______________________________________________ > > lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
