hi jonathan, here's a remote possibility that might explain some of the behaviour you are seeting:
when the power goes down to your roach2's, does the power also go down on the sample clock or 1 PPS distribution? if the sample clock continues to be fed to the ADC's, then the CMOS adc chips can continued to be powered via the clock, or perhaps via 1 PPS, and because the voltages are low, the adc's can get in a wierd mode... you might need to power off the 1 PPS and sample clock ? or after power is restored, issue a reset to the ADC ? dan On Tue, Apr 17, 2018 at 4:22 PM, Jonathan Weintroub < [email protected]> wrote: > Hi CASPERites, > > With experience on quite a few ROACH2s in the lab and in the field for > some years, and a pattern has emerged which warrants a question to the > ROACH2 experts on this list. The SAO team has seen strange faults happen on > multiple ROACH2 units after power failures, dips and lightening storms. > I’ll list the various weirdnesses below, but the key point is while a full > power cycle, including removing power from the line input, does not reset > and cure the units. But extended power down (like overnight, or 24 hours, > or more) does seem to bring the units back to life again. This was > discovered serendipitously, and has happened often enough that the pattern > seems repeatable (though controlled experiments aren’t really possible, we > try not to stress our equipment this way). > > Has anyone else seen this, and does someone perhaps have a suggestion as > to root cause, or some way to accelerate the reset? > > Example faults have included: > > —ADC5G clock not being correctly received, or not being transmitted to > FPGA, or being transmitted at incorrect speed. > > —A particular ADC would refuse to calibrate its digital interface to the > FPGA. > > —QDRs which don’t calibrate > > —After a lightening storm on Maunakea we have two units with a single SFP+ > port among 8 falling to transmit packets, though we have yet to see if an > extended power down will cure this. > > Again these faults have been distributed across multiple units, and in all > cases have eventually been cleared, after extended power down. Which is > good, but the pathology worries us. > > Thanks in advance for any light that might be cast on this issue. > > Jonathan and André > EHT/SMA > > -- > You received this message because you are subscribed to the Google Groups " > [email protected]" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > -- You received this message because you are subscribed to the Google Groups "[email protected]" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected].

