these are the settings in the manual, which i tried. i'll check the conns_per_peer setting though, i'm not sure what mine was set to
lnd tunables: peercredits_hiw: 64 map_on_demand: 32 concurrent_sends: 256 fmr_pool_size: 2048 fmr_flush_trigger: 512 fmr_cache: 1 tunables: peer_timeout: 180 peer_credits: 128 peer_buffer_credits: 0 credits: 1024 CPT: "[0,0,0,0]" On Sat, Jul 6, 2024 at 6:07 AM Andreas Dilger <[email protected]> wrote: > > > > On Jul 5, 2024, at 11:37, Michael DiDomenico via lustre-discuss > <[email protected]> wrote: > > i could use a little help with lustre clients over omni path. when i > run ib_write_bw tests between two compute nodes i get +10GB/sec. > compute nodes are rhel9.4 with rhel hw drivers > > however, when i run lnet_selftest between the same two compute nodes > > 1m i/o size > 16 concurrency > > node1-node3 > read 1m i/o ~7.1GB/sec > write 1m i/o ~4.7GB/sec > > node3-node1 > read 1m i/o ~6.6GB/sec > write 1m i/o ~4.9GB/sec > > varying the i/o size and concurrency changes the numbers, but not > dramatically. i've gone through the tuning guide for omnipath and my > lnd tunables all match, but i can't seem to drive the bandwidth any > higher between nodes. > > > Please provide the actual tuning parameters in use. > > Even when we were part of Intel, the OPA tuning parameters suggested by > the OPA team were not necessarily the best in all cases. There was some > kind of memory registration they kept suggesting, but it was always worse > in practice than in theory. > > The biggest win was from setting conns_per_peer=4 or so, because OPA > needs more CPU resources for good performance than IB. > > That said, it has been several years since I've had to deal with it, so I > can't > say if your current performance is good or bad.. > > can anyone suggest where i might be dropping some performance or is > this the end? i feel like there should be more performance here, but > since we recently retooled from rhel7 to rhel9, i'm unsure if there's > a tunable not tuned. (unfortunately i don't have/can't seem to find > previous numbers to compare) > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Whamcloud > > > > > > > _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
