Man am I an idiot. Been up all night too many nights in a row and not
enough coffee. It helps if you use the correct --net designation. I
was typing ib0 instead of o2ib0. Declaring as o2ib0 works fine.

(cleanup from previous)
lctl net down && lustre_rmmod

(new attempt)
modprobe lnet -v
lnetctl lnet configure
lnetctl net add --if enp1s0np0 --net o2ib0
lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: o2ib
      local NI(s):
        - nid: 10.0.50.27@o2ib
          status: up
          interfaces:
              0: enp1s0np0

Lots more to test and verify but the original mailing list submission
was total pilot error on my part. Apologies to all who spent cycles
pondering this nothingburger.




On Tue, Jan 9, 2024 at 7:45 PM Jeff Johnson
<jeff.john...@aeoncomputing.com> wrote:
>
> Howdy intrepid Lustrefarians,
>
> While starting down the debug rabbit hole I thought I'd raise my hand
> and see if anyone has a few magic beans to spare.
>
> I cannot get lnet (via lnetctl) to init a o2iblnd interface on a
> RoCEv2 interface.
>
> Running `lnetctl net add --net ib0 --if enp1s0np0` results in
>  net:
>           errno: -1
>           descr: cannot parse net '<255:65535>'
>
> Nothing in dmesg to indicate why. Search engines aren't coughing up
> much here either.
>
> Env: Rocky 8.9 x86_64, MOFED 5.8-4.1.5.0, Lustre 2.15.4
>
> I'm able to run mpi over the RoCEv2 interface. Utils like ibstatus and
> ibdev2netdev report it correctly. ibv_rc_pingpong works fine between
> nodes.
>
> Configuring as socklnd works fine. `lnetctl net add --net tcp0 --if
> enp1s0np0 && lnetctl net show`
> [root@r2u11n3 ~]# lnetctl net show
> net:
>     - net type: lo
>       local NI(s):
>         - nid: 0@lo
>           status: up
>     - net type: tcp
>       local NI(s):
>         - nid: 10.0.50.27@tcp
>           status: up
>           interfaces:
>               0: enp1s0np0
>
> I verified the RoCEv2 interface using nVidia's `cma_roce_mode` as well
> as sysfs references
>
> [root@r2u11n3 ~]# cma_roce_mode -d mlx5_0 -p 1
> RoCE v2
>
> Ideas? Suggestions? Incense?
>
> Thanks,
>
> --Jeff



-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to