i could use a little help with lustre clients over omni path. when i run ib_write_bw tests between two compute nodes i get +10GB/sec. compute nodes are rhel9.4 with rhel hw drivers
however, when i run lnet_selftest between the same two compute nodes 1m i/o size 16 concurrency node1-node3 read 1m i/o ~7.1GB/sec write 1m i/o ~4.7GB/sec node3-node1 read 1m i/o ~6.6GB/sec write 1m i/o ~4.9GB/sec varying the i/o size and concurrency changes the numbers, but not dramatically. i've gone through the tuning guide for omnipath and my lnd tunables all match, but i can't seem to drive the bandwidth any higher between nodes. can anyone suggest where i might be dropping some performance or is this the end? i feel like there should be more performance here, but since we recently retooled from rhel7 to rhel9, i'm unsure if there's a tunable not tuned. (unfortunately i don't have/can't seem to find previous numbers to compare) _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
