The SX1012 and SX1036 definitely use the same ASIC series and both suffer from the PHY link problem. I've got seven of them here, and tested them with 20 different ROACH-2 boards. As Jack points out, some links are worse than others and swapping cables and ROACH2 mezzanine cards often sorts the problem out. To be honest, the issue is probably on the ROACH-2 side, as we haven't tuned the PHY's RX parameters. Mellanox claim to have tested and complied to IEEE standards.
To get around this, Mellanox have a "patch" for CASPER users, which adjusts the TX drive on the switch so that a bigger signal hits the ROACH2. This fixes all the problems, and the links are then very very reliable. They actually borrowed a ROACH-2 for the purpose of testing and tuning their PHYs. Very kind of them! This tuning was done using their own 3m cables, so you should expect the best performance using those copper links. Another option is to use AOC cables, which are cost-effective and loss-less. Lincoln, not to alarm you, but are you actually checking for bit errors on your SX1024 links? I suspect you're just not noticing any faults. Packets don't always get dropped... oftentimes a few bits in the data is just corrupted, and if you're looking at correlator noise, you might not even notice it! ...the "RX error" line from the 10G core is hard-coded to zero, so if this is what you're watching, you'd never know. Likewise the ROACH2->switch links work fine, it's only the switch->ROACH2 direction that's problematic, so you'd never see the error counters on the switch climbing. We (SKA-SA) add checksums to our packets to be sure. Mostly, we were seeing BER numbers like 10e-10 on most of the Mellanox kit out-the-box, but a few links had bad links with BER ~10E-6. After patching, it's better than 10e-13. Jason Manley CBF Manager SKA-SA Cell: +27 82 662 7726 Work: +27 21 506 7300 On 02 Sep 2014, at 18:01, Lincoln Greenhill <lgreenh...@cfa.harvard.edu> wrote: > Hi Jack (and John), > > Interesting report. Difficult to puzzle out > differences in our mutual experience unless the > 1012 is unlike the 1024 in subtle ways. > > We (LEDA) did not require an upgrade or Mellanox cables > for the 10/40GbE links. Again - I would look > forward to discussion people may have with Jonathon > (Cc me if not conducted via the Casper list). > > Best, > Lincoln > > On 9/2/14, 11:54 AM, Jack Hickish wrote: >> Hi John, >> >> As Dan said I've tested (to some extent) the SX1012. I just used one >> ROACH2 and corner >> turned data through 8 x 10GbE ports. It worked well basically up to >> line rate, with no CRC errors after a few hours of operation, but only >> after >> >> - Jason put me in touch with some Mellanox guys who provided updated firmware >> - using branded Mellanox 3m QSFP -> 4xSFP cables >> >> Before the upgrade transmissions from the switch to the ROACH2 would >> frequently fail. >> >> Of course this may or may not be remotely relevant for the SX1024 :) >> >> Good luck! >> >> J >> >> On 2 September 2014 16:46, Dan Werthimer <d...@ssl.berkeley.edu> wrote: >>> >>> hi john, >>> >>> jack hickish tested a mellanox SX1012 >>> (12x40Gbe, or 48x10Gbe, or mixture), >>> when he was visiting berkeley. >>> >>> the SX1012 worked beautifully on roach2, after jack upgraded >>> the switch firmware. and it's a great price. ($6K) >>> >>> i think jason has also tested this switch. >>> >>> best wishes, >>> >>> dan >>> >>> >>> >>> >>> On Tue, Sep 2, 2014 at 8:36 AM, John Ford <jf...@nrao.edu> wrote: >>>> >>>> Hi all. Has anyone tested the Mellanox SX-1024 series of switches with >>>> ROACH-2 and 10 and/or 40 gb NICs? >>>> >>>> These switches have 48 10 Gbe ports and 12 40 Gbe ports on them. >>>> >>>> Thanks! >>>> >>>> John >>>> >>>> >>>> >>> >> > > -- > Lincoln J. Greenhill Harvard-Smithsonian CfA > Office: 1 617-495-7194 60 Garden St, Mail Stop 42 > Cell: 1 650 722-7798 Cambridge, MA 02138 > FAX: 1 617-495-7345 greenh...@cfa.harvard.edu > Skype: ljgreenhill www.cfa.harvard.edu/~lincoln >