It would seem that the error message could be improved in this case? Could you file an LU ticket for that with the reproducer below, and ideally along with a patch?
Cheers, Andreas > On Jan 10, 2024, at 11:37, Jeff Johnson <[email protected]> > wrote: > > Man am I an idiot. Been up all night too many nights in a row and not > enough coffee. It helps if you use the correct --net designation. I > was typing ib0 instead of o2ib0. Declaring as o2ib0 works fine. > > (cleanup from previous) > lctl net down && lustre_rmmod > > (new attempt) > modprobe lnet -v > lnetctl lnet configure > lnetctl net add --if enp1s0np0 --net o2ib0 > lnetctl net show > net: > - net type: lo > local NI(s): > - nid: 0@lo > status: up > - net type: o2ib > local NI(s): > - nid: 10.0.50.27@o2ib > status: up > interfaces: > 0: enp1s0np0 > > Lots more to test and verify but the original mailing list submission > was total pilot error on my part. Apologies to all who spent cycles > pondering this nothingburger. > > > > >> On Tue, Jan 9, 2024 at 7:45 PM Jeff Johnson >> <[email protected]> wrote: >> >> Howdy intrepid Lustrefarians, >> >> While starting down the debug rabbit hole I thought I'd raise my hand >> and see if anyone has a few magic beans to spare. >> >> I cannot get lnet (via lnetctl) to init a o2iblnd interface on a >> RoCEv2 interface. >> >> Running `lnetctl net add --net ib0 --if enp1s0np0` results in >> net: >> errno: -1 >> descr: cannot parse net '<255:65535>' >> >> Nothing in dmesg to indicate why. Search engines aren't coughing up >> much here either. >> >> Env: Rocky 8.9 x86_64, MOFED 5.8-4.1.5.0, Lustre 2.15.4 >> >> I'm able to run mpi over the RoCEv2 interface. Utils like ibstatus and >> ibdev2netdev report it correctly. ibv_rc_pingpong works fine between >> nodes. >> >> Configuring as socklnd works fine. `lnetctl net add --net tcp0 --if >> enp1s0np0 && lnetctl net show` >> [root@r2u11n3 ~]# lnetctl net show >> net: >> - net type: lo >> local NI(s): >> - nid: 0@lo >> status: up >> - net type: tcp >> local NI(s): >> - nid: 10.0.50.27@tcp >> status: up >> interfaces: >> 0: enp1s0np0 >> >> I verified the RoCEv2 interface using nVidia's `cma_roce_mode` as well >> as sysfs references >> >> [root@r2u11n3 ~]# cma_roce_mode -d mlx5_0 -p 1 >> RoCE v2 >> >> Ideas? Suggestions? Incense? >> >> Thanks, >> >> --Jeff > > > > -- > ------------------------------ > Jeff Johnson > Co-Founder > Aeon Computing > > [email protected] > www.aeoncomputing.com > t: 858-412-3810 x1001 f: 858-412-3845 > m: 619-204-9061 > > 4170 Morena Boulevard, Suite C - San Diego, CA 92117 > > High-Performance Computing / Lustre Filesystems / Scale-out Storage > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
