Hi, Jonathan, The ROACH2s at GB output over all 8 SFP+ ports very often without problem. Not sure whether this matters, but they are connected via fiber optic transceivers rather than copper cables.
HTH, Dave > On Apr 19, 2018, at 11:00, Jonathan Weintroub <[email protected]> > wrote: > > Dear kind CASPER Colleagues, > > To offer a little more feedback on this: > > —We reiterate that all advice is appreciated and useful. They may well be > relevant to prior weird experiences, however in the current case . . . > > — . . . after assorted power cycles removing all inputs, confirmation that > the unit has an approved FSP power supply, and swapping in spares both at the > LRU and NIC level, we are now convinced that our current issues with one > 10GigE port of 8 going down are not ROACH2 hardware related, but rather > something to do with the environment in which it is installed (i.e. related > to external stimuli). Still investigating. > > —One unusual aspect of this application is we are using all 8 SFP+ ports on > the ROACH2, though we are not stressing the rates. It is a long shot, but are > there any insights into possible stresses or snafus we might run into when > fully utilizing the ROACH 10GigE NIC ports? > > Thanks again. > > Jonathan & crew > > > > >> On Apr 18, 2018, at 10:13 AM, Jonathan Weintroub <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Jonathon, >> >> Your important input here warrants cc to the mailing list, hereby >> accomplished. >> >> We have switched to the FSP power supplies for new builds, and have repaired >> older ROACH2s a number of which have had failing XEALs (mostly) by replacing >> same with FSPs. We have I think done some prophylactic FSP replacements in >> offline spare stock. But we’ve ordered and deployed probably over 100 >> ROACH2s over about a four, perhaps even five year period, they are used at >> SMA for SWARM, and also distributed all over the world for the EHT. So we >> have NOT retrofitted every unit out there with FSP power supplies. >> >> While the XEAL are known to be not reliable, when a unit is working, it's >> not that straightforward to recall it for a power supply replacement—ain’t >> broke don’t fix applies. >> >> Thanks for your input. Thanks also for input from Dan, Jason, Matt and Mike, >> which is valuable and relevant advice. I was holding off on responding, >> we’re at SMA running tests, and don’t yet know the resolution for the units >> in question. >> >> Jonathon’s email triggered this interim response. I’ll let all know the >> outcome on the lightening damage when we have one. >> >> Thanks, >> >> Jonathan >> >> >> >>> On Apr 18, 2018, at 9:44 AM, Jonathon Kocz <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi Jonathan, >>> >>> I think you've already addressed this, but to double check, are these R2s >>> after you switched to the SP25-60FAG power supply? >>> >>> I've had a lot of trouble with R2s using istar/xeal supplies getting into >>> strange situations that always seem fixable with a new power supply. >>> >>> Cheers, >>> Jonathon >>> >>> On 17 April 2018 at 16:22, Jonathan Weintroub <[email protected] >>> <mailto:[email protected]>> wrote: >>> Hi CASPERites, >>> >>> With experience on quite a few ROACH2s in the lab and in the field for some >>> years, and a pattern has emerged which warrants a question to the ROACH2 >>> experts on this list. The SAO team has seen strange faults happen on >>> multiple ROACH2 units after power failures, dips and lightening storms. >>> I’ll list the various weirdnesses below, but the key point is while a full >>> power cycle, including removing power from the line input, does not reset >>> and cure the units. But extended power down (like overnight, or 24 hours, >>> or more) does seem to bring the units back to life again. This was >>> discovered serendipitously, and has happened often enough that the pattern >>> seems repeatable (though controlled experiments aren’t really possible, we >>> try not to stress our equipment this way). >>> >>> Has anyone else seen this, and does someone perhaps have a suggestion as to >>> root cause, or some way to accelerate the reset? >>> >>> Example faults have included: >>> >>> —ADC5G clock not being correctly received, or not being transmitted to >>> FPGA, or being transmitted at incorrect speed. >>> >>> —A particular ADC would refuse to calibrate its digital interface to the >>> FPGA. >>> >>> —QDRs which don’t calibrate >>> >>> —After a lightening storm on Maunakea we have two units with a single SFP+ >>> port among 8 falling to transmit packets, though we have yet to see if an >>> extended power down will cure this. >>> >>> Again these faults have been distributed across multiple units, and in all >>> cases have eventually been cleared, after extended power down. Which is >>> good, but the pathology worries us. >>> >>> Thanks in advance for any light that might be cast on this issue. >>> >>> Jonathan and André >>> EHT/SMA >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "[email protected] <mailto:[email protected]>" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected] >>> <mailto:casper%[email protected]>. >>> To post to this group, send email to [email protected] >>> <mailto:[email protected]>. >>> >> > > > -- > You received this message because you are subscribed to the Google Groups > "[email protected]" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To post to this group, send email to [email protected] > <mailto:[email protected]>. -- You received this message because you are subscribed to the Google Groups "[email protected]" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected].

