Hi Werner, Nice job capturing all this! What about a multi-voltage supervisor for 1V2, 2V5 and 3V3 rails, such as the STM6179 [1] rather than relying on an unregulated 5V from a wallwart. And perhaps should you use a 12V wallwart and and board 5V switching (pre-) regulator? This allows for looser constraints on the wallwart, then there's a backwards compatibility issue? Cheers, Ed. [1] http://www.st.com/internet/com/TECHNICAL_RESOURCES/TECHNICAL_LITERATURE/DATASHEET/CD00060157.pdf > Date: Fri, 28 Oct 2011 21:39:44 -0300 > From: [email protected] > To: [email protected] > Subject: [Milkymist-devel] The dungeons of NORia: Meeting the Balrog > > The exploration of the dungeons of NORia has finally led to a > meeting with the supposed arch-enemy: the power-down behaviour of > the reset circuit. > > > Background > ---------- > > M1rc3 has a special reset chip (U24, [1]) that resets FPGA and NOR > when powering up and that also holds them in reset when the 3.3 V > rail drops below 2.63 V. The expectation was that this would > prevent the NOR corruption. Alas, it didn't. > > After poking around for a while, we started to suspect that, when > powering down, the 3.3 V rail may drop more slowly than some of > the other rails - particularly any of the power rails supplying > the FPGA core. > > In this case, the FPGA could get confused, send out weird signals, > which would then be properly amplified by the FPGA's I/O drivers > (operating at 3.3 V), received by the NOR (also operating at > 3.3 V), and finally every once in a while producing a valid > command the NOR may still have enough time to process before it > also loses power. > > Power rails can drop at different speeds because each has its own > regulator and output buffering. It's not trivial to assure that > rails come up or down in a specific order and it's also difficult > to measure the exact order, because it can vary a lot with what > the system is doing at the time of the power cut. > > However, we know that no power rail can drop faster than the power > input. Because if a rail would drop faster, the regulator could > simply draw more power from the input to bring the rail back up > again. > > Thus the idea was born to drive the reset chip not from the > regulated 3.3 V rail but from the filtered but unregulated 5 V > input. Also, to make sure we cut out in time, the threshold > voltage of the reset chip should be closer to 5 V. > > > The rework > ---------- > > I removed the old reset chip and replaced it with an > APX803-44SAG-7 [2] which has a threshold voltage of 4.38 V. To > isolate the input pin from the 3.3 V pad on the PCB, I placed a > piece of single-sided 0.36 mm FR4 board [3] between chip and pad. > > The closest 5 V source I could find is C125, part of the MIDI TX > circuit. > > This is what it looks like: > > http://downloads.qi-hardware.com/people/werner/m1/nor/d8/u24-to-5V.jpg > > > M1 behaviour after rework > ------------------------- > > Immediately after the rework, the M1 behaved a little odd. It did > reset and enter standby, but when I tried to get into the BIOS to > run the CRC test, it just stopped (maybe a spurious reset). > > I'm not sure what happened there. Later, I checked the voltages, > and they're all good: 4.98 V at the DC jack and 4.94 V at U24 pin > 3. > > Eventually, it gave in and behaved properly. I then proceeded to > run the usual power-cycling loop. > > > Testing > ------- > > I ran the power-cycling test for 4284 cycles. It did not report a > single corruption. > > Afterwards, I did a CRC check, which also showed that everything > was in good health (*). Last but not least, I dumped the lock bits > and verified that block 0 was indeed unlocked. > > This means that the test seems to be valid. If we assume a > previous corruption probability of 1/500 per cycle, the > probability of passing 4284 cycles without hitting a single > corruption would be about 0.02%. > > (*) In case you're checking my log [4]: the rescue BIOS failed the > CRC check. I think it's the MAC address that causes the CRC to > fail. I never bothered to fix this, so that failure is normal > and expected. > > > Conclusion > ---------- > > It seems that changing the reset circuit such that it always > resets FPGA and NOR when power is ramping down does reduce the > rate of NOR corruptions substantially and may even eliminate the > problem entirely. > > The instabilities observed immediately after the rework need > further examination. They may have been caused by residues of the > rework (e.g., flux that hasn't dried completely), but another > possible explanation would be short voltage drops on the 5 V rail > during load changes. > > We may also consider using a reset chip with a lower threshold > voltage. E.g., the APX803-40SAG-7 with a nominal threshold of > 4.0 V should still give the 3.3 V regulator [5] enough room to do > its work, while being less sensitive to small upsets of the 5 V > supply. > > > What's next > ----------- > > I'll play with my M1 in "regular use" for a bit and watch for > unexplained resets/hangs/etc. > > After that, a longer test run should provide more certainty that > the corruption is really gone. The probability for that increases > roughly exponentially with the number of cycles, and each 5-6 > hours add a factor of ten. So a couple of days should be > sufficient. > > Last but not least, this needs testing with the supply voltage at > its limits, e.g., the 4.75 V to 5.25 V allowed for a USB host. > > > [1] http://www.ait-ic.com/uploads//2009-10/21/_1256089836_7ol2c.pdf > [2] http://www.diodes.com/datasheets/APX803.pdf > [3] http://search.digikey.com/us/en/products/PC94/PC94-ND/354417 > [4] http://downloads.qi-hardware.com/people/werner/m1/nor/d8/raw.tar.bz2 > [5] http://www.national.com/profile/snip.cgi/openDS=LP38690 > > - Werner > _______________________________________________ > http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org > IRC: #milkymist@Freenode
_______________________________________________ http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org IRC: #milkymist@Freenode
