Hi CASPERites, With experience on quite a few ROACH2s in the lab and in the field for some years, and a pattern has emerged which warrants a question to the ROACH2 experts on this list. The SAO team has seen strange faults happen on multiple ROACH2 units after power failures, dips and lightening storms. I’ll list the various weirdnesses below, but the key point is while a full power cycle, including removing power from the line input, does not reset and cure the units. But extended power down (like overnight, or 24 hours, or more) does seem to bring the units back to life again. This was discovered serendipitously, and has happened often enough that the pattern seems repeatable (though controlled experiments aren’t really possible, we try not to stress our equipment this way).
Has anyone else seen this, and does someone perhaps have a suggestion as to root cause, or some way to accelerate the reset? Example faults have included: —ADC5G clock not being correctly received, or not being transmitted to FPGA, or being transmitted at incorrect speed. —A particular ADC would refuse to calibrate its digital interface to the FPGA. —QDRs which don’t calibrate —After a lightening storm on Maunakea we have two units with a single SFP+ port among 8 falling to transmit packets, though we have yet to see if an extended power down will cure this. Again these faults have been distributed across multiple units, and in all cases have eventually been cleared, after extended power down. Which is good, but the pathology worries us. Thanks in advance for any light that might be cast on this issue. Jonathan and André EHT/SMA -- You received this message because you are subscribed to the Google Groups "casper@lists.berkeley.edu" group. To unsubscribe from this group and stop receiving emails from it, send an email to casper+unsubscr...@lists.berkeley.edu. To post to this group, send email to casper@lists.berkeley.edu.