It would seem that the error message could be improved in this case?  Could you 
file an LU ticket for that with the reproducer below, and ideally along with a 
patch?

Cheers, Andreas

> On Jan 10, 2024, at 11:37, Jeff Johnson <[email protected]> 
> wrote:
> 
> Man am I an idiot. Been up all night too many nights in a row and not
> enough coffee. It helps if you use the correct --net designation. I
> was typing ib0 instead of o2ib0. Declaring as o2ib0 works fine.
> 
> (cleanup from previous)
> lctl net down && lustre_rmmod
> 
> (new attempt)
> modprobe lnet -v
> lnetctl lnet configure
> lnetctl net add --if enp1s0np0 --net o2ib0
> lnetctl net show
> net:
>    - net type: lo
>      local NI(s):
>        - nid: 0@lo
>          status: up
>    - net type: o2ib
>      local NI(s):
>        - nid: 10.0.50.27@o2ib
>          status: up
>          interfaces:
>              0: enp1s0np0
> 
> Lots more to test and verify but the original mailing list submission
> was total pilot error on my part. Apologies to all who spent cycles
> pondering this nothingburger.
> 
> 
> 
> 
>> On Tue, Jan 9, 2024 at 7:45 PM Jeff Johnson
>> <[email protected]> wrote:
>> 
>> Howdy intrepid Lustrefarians,
>> 
>> While starting down the debug rabbit hole I thought I'd raise my hand
>> and see if anyone has a few magic beans to spare.
>> 
>> I cannot get lnet (via lnetctl) to init a o2iblnd interface on a
>> RoCEv2 interface.
>> 
>> Running `lnetctl net add --net ib0 --if enp1s0np0` results in
>> net:
>>          errno: -1
>>          descr: cannot parse net '<255:65535>'
>> 
>> Nothing in dmesg to indicate why. Search engines aren't coughing up
>> much here either.
>> 
>> Env: Rocky 8.9 x86_64, MOFED 5.8-4.1.5.0, Lustre 2.15.4
>> 
>> I'm able to run mpi over the RoCEv2 interface. Utils like ibstatus and
>> ibdev2netdev report it correctly. ibv_rc_pingpong works fine between
>> nodes.
>> 
>> Configuring as socklnd works fine. `lnetctl net add --net tcp0 --if
>> enp1s0np0 && lnetctl net show`
>> [root@r2u11n3 ~]# lnetctl net show
>> net:
>>    - net type: lo
>>      local NI(s):
>>        - nid: 0@lo
>>          status: up
>>    - net type: tcp
>>      local NI(s):
>>        - nid: 10.0.50.27@tcp
>>          status: up
>>          interfaces:
>>              0: enp1s0np0
>> 
>> I verified the RoCEv2 interface using nVidia's `cma_roce_mode` as well
>> as sysfs references
>> 
>> [root@r2u11n3 ~]# cma_roce_mode -d mlx5_0 -p 1
>> RoCE v2
>> 
>> Ideas? Suggestions? Incense?
>> 
>> Thanks,
>> 
>> --Jeff
> 
> 
> 
> --
> ------------------------------
> Jeff Johnson
> Co-Founder
> Aeon Computing
> 
> [email protected]
> www.aeoncomputing.com
> t: 858-412-3810 x1001   f: 858-412-3845
> m: 619-204-9061
> 
> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
> 
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
> _______________________________________________
> lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to