Alejandro, Is your MGS located on the same node as your primary MDT? (combined MGS/MDT node)
--Jeff On Wed, Aug 9, 2023 at 9:46 AM Alejandro Sierra via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > Hello, > > In 2018 we implemented a lustre system 2.10.5 with 20 OSTs in two OSS > with 4 jboxes, each box with 24 disks of 12 TB each, for a total of > nearly 1 PB. In all that time we had power failures and failed raid > controller cards, all of which made us adjust the configuration. After > the last failure, the system keeps sending error messages about OSTs > that are no more in the system. In the MDS I do > > # lctl dl > > and I get the 20 currently active OSTs > > oss01.lanot.unam.mx - OST00 /dev/disk/by-label/lustre-OST0000 > oss01.lanot.unam.mx - OST01 /dev/disk/by-label/lustre-OST0001 > oss01.lanot.unam.mx - OST02 /dev/disk/by-label/lustre-OST0002 > oss01.lanot.unam.mx - OST03 /dev/disk/by-label/lustre-OST0003 > oss01.lanot.unam.mx - OST04 /dev/disk/by-label/lustre-OST0004 > oss01.lanot.unam.mx - OST05 /dev/disk/by-label/lustre-OST0005 > oss01.lanot.unam.mx - OST06 /dev/disk/by-label/lustre-OST0006 > oss01.lanot.unam.mx - OST07 /dev/disk/by-label/lustre-OST0007 > oss01.lanot.unam.mx - OST08 /dev/disk/by-label/lustre-OST0008 > oss01.lanot.unam.mx - OST09 /dev/disk/by-label/lustre-OST0009 > oss02.lanot.unam.mx - OST15 /dev/disk/by-label/lustre-OST000f > oss02.lanot.unam.mx - OST16 /dev/disk/by-label/lustre-OST0010 > oss02.lanot.unam.mx - OST17 /dev/disk/by-label/lustre-OST0011 > oss02.lanot.unam.mx - OST18 /dev/disk/by-label/lustre-OST0012 > oss02.lanot.unam.mx - OST19 /dev/disk/by-label/lustre-OST0013 > oss02.lanot.unam.mx - OST25 /dev/disk/by-label/lustre-OST0019 > oss02.lanot.unam.mx - OST26 /dev/disk/by-label/lustre-OST001a > oss02.lanot.unam.mx - OST27 /dev/disk/by-label/lustre-OST001b > oss02.lanot.unam.mx - OST28 /dev/disk/by-label/lustre-OST001c > oss02.lanot.unam.mx - OST29 /dev/disk/by-label/lustre-OST001d > > but I also get 5 that are not currently active, in fact doesn't exist > > 28 IN osp lustre-OST0014-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4 > 29 UP osp lustre-OST0015-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4 > 30 UP osp lustre-OST0016-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4 > 31 UP osp lustre-OST0017-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4 > 32 UP osp lustre-OST0018-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4 > > When I try to eliminate them with > > lctl conf_param -P osp.lustre-OST0015-osc-MDT0000.active=0 > > I get the error > > conf_param: invalid option -- 'P' > set a permanent config parameter. > This command must be run on the MGS node > usage: conf_param [-d] <target.keyword=val> > -d Remove the permanent setting. > > If I do > > lctl --device 28 deactivate > > I don't get an error, but nothing changes > > What can I do? > > Thank you in advance for any help. > > -- > Alejandro Aguilar Sierra > LANOT, ICAyCC, UNAM > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- ------------------------------ Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite C - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org