Alejandro,

Is your MGS located on the same node as your primary MDT? (combined MGS/MDT
node)

--Jeff

On Wed, Aug 9, 2023 at 9:46 AM Alejandro Sierra via lustre-discuss <
lustre-discuss@lists.lustre.org> wrote:

> Hello,
>
> In 2018 we implemented a lustre system 2.10.5 with 20 OSTs in two OSS
> with 4 jboxes, each box with 24 disks of 12 TB each, for a total of
> nearly 1 PB. In all that time we had power failures and failed raid
> controller cards, all of which made us adjust the configuration. After
> the last failure, the system keeps sending error messages about OSTs
> that are no more in the system. In the MDS I do
>
> # lctl dl
>
> and I get the 20 currently active OSTs
>
> oss01.lanot.unam.mx     -       OST00   /dev/disk/by-label/lustre-OST0000
> oss01.lanot.unam.mx     -       OST01   /dev/disk/by-label/lustre-OST0001
> oss01.lanot.unam.mx     -       OST02   /dev/disk/by-label/lustre-OST0002
> oss01.lanot.unam.mx     -       OST03   /dev/disk/by-label/lustre-OST0003
> oss01.lanot.unam.mx     -       OST04   /dev/disk/by-label/lustre-OST0004
> oss01.lanot.unam.mx     -       OST05   /dev/disk/by-label/lustre-OST0005
> oss01.lanot.unam.mx     -       OST06   /dev/disk/by-label/lustre-OST0006
> oss01.lanot.unam.mx     -       OST07   /dev/disk/by-label/lustre-OST0007
> oss01.lanot.unam.mx     -       OST08   /dev/disk/by-label/lustre-OST0008
> oss01.lanot.unam.mx     -       OST09   /dev/disk/by-label/lustre-OST0009
> oss02.lanot.unam.mx     -       OST15   /dev/disk/by-label/lustre-OST000f
> oss02.lanot.unam.mx     -       OST16   /dev/disk/by-label/lustre-OST0010
> oss02.lanot.unam.mx     -       OST17   /dev/disk/by-label/lustre-OST0011
> oss02.lanot.unam.mx     -       OST18   /dev/disk/by-label/lustre-OST0012
> oss02.lanot.unam.mx     -       OST19   /dev/disk/by-label/lustre-OST0013
> oss02.lanot.unam.mx     -       OST25   /dev/disk/by-label/lustre-OST0019
> oss02.lanot.unam.mx     -       OST26   /dev/disk/by-label/lustre-OST001a
> oss02.lanot.unam.mx     -       OST27   /dev/disk/by-label/lustre-OST001b
> oss02.lanot.unam.mx     -       OST28   /dev/disk/by-label/lustre-OST001c
> oss02.lanot.unam.mx     -       OST29   /dev/disk/by-label/lustre-OST001d
>
> but I also get 5 that are not currently active, in fact doesn't exist
>
>  28 IN osp lustre-OST0014-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
>  29 UP osp lustre-OST0015-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
>  30 UP osp lustre-OST0016-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
>  31 UP osp lustre-OST0017-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
>  32 UP osp lustre-OST0018-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
>
> When I try to eliminate them with
>
> lctl conf_param -P osp.lustre-OST0015-osc-MDT0000.active=0
>
> I get the error
>
> conf_param: invalid option -- 'P'
> set a permanent config parameter.
> This command must be run on the MGS node
> usage: conf_param [-d] <target.keyword=val>
>   -d  Remove the permanent setting.
>
> If I do
>
> lctl --device 28 deactivate
>
> I don't get an error, but nothing changes
>
> What can I do?
>
> Thank you in advance for any help.
>
> --
> Alejandro Aguilar Sierra
> LANOT, ICAyCC, UNAM
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to