Hello,

I've been running into a strange issue where my writes are blazingly fast
(5.5 GB/s) over RoCE with Mellanox MCX516A-CCAT cards all running together
over o2ib, but read performance tanks to roughly 100 MB/s. During mixed
read/write situations write performance also plummets to sub 100MB/s.

Curiously, when using tcp these problems disappear and everyone is happy,
hovering around 1.5 GB/s read, 3 GB/s write.

I'm wondering if anyone else has run into this and what the solution may
be?
My setup is:
Debian 10.3, lustre 2.13.0, zfs 0.8.2 with two OSS/OST pairs, a single
mgs/mdt node and a single client node connected over o2ib. Everyone is
cabled together via 100g fiber through a mellanox switch that's configured
for roce and bonding, and they all hit about 98 Gb/s to each other via
ib_send_bw, and simple testing of network file transfers via NFSoRDMA
didn't experience the slowdown that lustre seems to be seeing.

I'd be happy to provide more diagnostic information if that helps, as well
as trace information if needed.

Best,
Christian

-- 
 <https://opendrives.com/nab/>
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to