Hi CASPERites,

With experience on quite a few ROACH2s in the lab and in the field for some 
years, and a pattern has emerged which warrants a question to the ROACH2 
experts on this list. The SAO team has seen strange faults happen on multiple 
ROACH2 units after power failures, dips and lightening storms.   I’ll list the 
various weirdnesses below, but the key point is while a full power cycle, 
including removing power from the line input, does not reset and cure the 
units. But extended power down (like overnight, or 24 hours, or more) does seem 
to bring the units back to life again.  This was discovered serendipitously, 
and has happened often enough that the pattern seems repeatable (though 
controlled experiments aren’t really possible, we try not to stress our 
equipment this way).

Has anyone else seen this, and does someone perhaps have a suggestion as to 
root cause, or some way to accelerate the reset?

Example faults have included:

—ADC5G clock not being correctly received, or not being transmitted to FPGA, or 
being transmitted at incorrect speed.

—A particular ADC would refuse to calibrate its digital interface to the FPGA.

—QDRs which don’t calibrate

—After a lightening storm on Maunakea we have two units with a single SFP+ port 
among 8 falling to transmit packets, though we have yet to see if an extended 
power down will cure this.

Again these faults have been distributed across multiple units, and in all 
cases have eventually been cleared, after extended power down.  Which is good, 
but the pathology worries us.

Thanks in advance for any light that might be cast on this issue.

Jonathan and André
EHT/SMA

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

Reply via email to