Ko2iblnd tunings depends on the specific hardware and overall LNet config. I 
would recommend using the default values unless you find performance or 
reliability issues.

FWIW, DDN wants to update the default values for 
peer_credits/peer_credits_hiw/concurrent_sends - 
https://review.whamcloud.com/c/fs/lustre-release/+/41140

Chris Horn

From: lustre-discuss <[email protected]> on behalf of 
Andreas Dilger via lustre-discuss <[email protected]>
Date: Friday, April 12, 2024 at 4:01 PM
To: Daniel Szkola <[email protected]>
Cc: lustre <[email protected]>
Subject: Re: [lustre-discuss] ko2iblnd.conf
The ko2iblnd-opa settings are only used if you have Intel OPA instead of 
Mellanox cards (depends on the ko2iblnd-probe script).  You should still have 
ko2iblnd line in the server config that is used for MLX cards in order to set 
the values to match on both sides.

As for the actual settings, someone with more LNet IB experience should chime 
in on what is best to use.  All I know is that they have to be the same on both 
sides or they get unhappy, and the usable values depend on the card type and 
MOFED/OFED version.  As a starting point I would just copy the client ko2iblnd 
options to the server and see if it works.

Cheers, Andreas


On Apr 11, 2024, at 12:02, Daniel Szkola 
<[email protected]<mailto:[email protected]>> wrote:

On the server node(s):

options ko2iblnd-opa peer_credits=32 peer_credits_hiw=16 credits=1024 
concurrent_sends=64 ntx=2048 map_on_demand=256 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

On clients:

options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024 
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

My concern isn’t so much the mismatch because I know that’s an issue but rather 
what numbers we should settle on with a recent lustre build. I also see the 
ko2iblnd-opa in the server config, which means because the server is actually 
loading ko2iblnd that maybe defaults are used?

What made me look was we were seeing lots of:
LNetError: 2961324:0:(o2iblnd_cb.c:2612:kiblnd_passive_connect()) Can't accept 
conn from xxx.xxx.xxx.xxx@o2ib2, queue depth too large:  42 (<=32 wanted)

—
Dan Szkola
FNAL



On Apr 11, 2024, at 12:36 PM, Andreas Dilger 
<[email protected]<mailto:[email protected]>> wrote:

[EXTERNAL] – This message is from an external sender


On Apr 11, 2024, at 09:56, Daniel Szkola via lustre-discuss 
<[email protected]<mailto:[email protected]>> wrote:


Hello all,

I recently discovered some mismatches in our /etc/modprobe.d/ko2iblnd.conf 
files between our clients and servers.

Is it now recommended to keep the defaults on this module and run without a 
config file or are there recommended numbers for lustre-2.15.X?

The only thing I’ve seen that provides any guidance is the Lustre wiki and an 
HP/Cray doc:

https://www.hpe.com/psnow/resources/ebooks/a00113867en_us_v2/Lustre_Server_Recommended_Tuning_Parameters_4.x.html

Anyone have any sage advice on what the ko2iblnd.conf (and possibly 
ko2iblnd-opa.conf and hfi1.conf as well) on modern systems?

It would be useful to know what specific settings are mismatched.  Definitely 
some of them need to be consistent between peers, others depend on your system.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud








Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud






_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to