I'm testing RoCE on 40 Gbps Mellanox ethernet cards and am getting a mlx4 QP operation error every time it gets to testing 132 kB packets. These are aggregate tests in that 16 cores on one host are doing bi-directional ping-pongs to 16 cores on another host across the Mellanox cards.
I've found some old references to similar mlx4 errors dating back to 2009 that lead me to believe this may be a firmware error. I believe we're running the most up to date version of the firmware. Could someone comment on whether these are firmware issues, and if so how to report them to Mellanox? I've attached some files with more detailed information on this problem. Dave Turner -- Work: davetur...@ksu.edu (785) 532-7791 118 Nichols Hall, Manhattan KS 66502 Home: drdavetur...@gmail.com cell: (785) 770-5929
mlx4_error.tar.gz
Description: GNU Zip compressed data