i could use a little help with lustre clients over omni path.  when i
run ib_write_bw tests between two compute nodes i get +10GB/sec.
compute nodes are rhel9.4 with rhel hw drivers

however, when i run lnet_selftest between the same two compute nodes

1m i/o size
16 concurrency

node1-node3
read 1m i/o ~7.1GB/sec
write 1m i/o ~4.7GB/sec

node3-node1
read 1m i/o ~6.6GB/sec
write 1m i/o ~4.9GB/sec

varying the i/o size and concurrency changes the numbers, but not
dramatically.  i've gone through the tuning guide for omnipath and my
lnd tunables all match, but i can't seem to drive the bandwidth any
higher between nodes.

can anyone suggest where i might be dropping some performance or is
this the end?  i feel like there should be more performance here, but
since we recently retooled from rhel7 to rhel9, i'm unsure if there's
a tunable not tuned.  (unfortunately i don't have/can't seem to find
previous numbers to compare)
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to