I'm testing RoCE on 40 Gbps Mellanox ethernet cards and am getting a mlx4 QP operation error every time it gets to testing 132 kB packets. These are aggregate tests in that 16 cores on one host are doing bi-directional ping-pongs to 16 cores on another host across the Mellanox cards.
I've found some old references to similar mlx4 errors dating back to
2009 that lead me to believe this may be a firmware error. I believe we're
running the most up to date version of the firmware.
Could someone comment on whether these are firmware issues, and
if so how to report them to Mellanox? I've attached some files with more
detailed information on this problem.
Dave Turner
--
Work: [email protected] (785) 532-7791
118 Nichols Hall, Manhattan KS 66502
Home: [email protected]
cell: (785) 770-5929
mlx4_error.tar.gz
Description: GNU Zip compressed data
