Hi Dave, On 29/01/15 11:31, Dave Turner wrote:
> I've found some old references to similar mlx4 errors dating back to > 2009 that lead me to believe this may be a firmware error. I believe we're > running the most up to date version of the firmware. There was a new version released a few days ago, 2.33.5100: http://www.mellanox.com/page/firmware_table_ConnectX3ProEN Release notes are here: http://www.mellanox.com/pdf/firmware/ConnectX3Pro-FW-2_33_5100-release_notes.pdf Bug fixes start on page 23, looks like there are 29 fixes in this version, and fix 1 is for RoCE (though of course may not be relevant) - "The first Read response was not treated as implicit ACK" (discovered in 2.30.8000). All the best, Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci