The exploration of the dungeons of NORia has finally led to a
meeting with the supposed arch-enemy: the power-down behaviour of
the reset circuit.


Background
----------

M1rc3 has a special reset chip (U24, [1]) that resets FPGA and NOR
when powering up and that also holds them in reset when the 3.3 V
rail drops below 2.63 V. The expectation was that this would
prevent the NOR corruption. Alas, it didn't.

After poking around for a while, we started to suspect that, when
powering down, the 3.3 V rail may drop more slowly than some of
the other rails - particularly any of the power rails supplying
the FPGA core.

In this case, the FPGA could get confused, send out weird signals,
which would then be properly amplified by the FPGA's I/O drivers
(operating at 3.3 V), received by the NOR (also operating at
3.3 V), and finally every once in a while producing a valid
command the NOR may still have enough time to process before it
also loses power.

Power rails can drop at different speeds because each has its own
regulator and output buffering. It's not trivial to assure that
rails come up or down in a specific order and it's also difficult
to measure the exact order, because it can vary a lot with what
the system is doing at the time of the power cut.

However, we know that no power rail can drop faster than the power
input. Because if a rail would drop faster, the regulator could
simply draw more power from the input to bring the rail back up
again.

Thus the idea was born to drive the reset chip not from the
regulated 3.3 V rail but from the filtered but unregulated 5 V
input. Also, to make sure we cut out in time, the threshold
voltage of the reset chip should be closer to 5 V.


The rework
----------

I removed the old reset chip and replaced it with an
APX803-44SAG-7 [2] which has a threshold voltage of 4.38 V. To
isolate the input pin from the 3.3 V pad on the PCB, I placed a 
piece of single-sided 0.36 mm FR4 board [3] between chip and pad.

The closest 5 V source I could find is C125, part of the MIDI TX
circuit.

This is what it looks like:

http://downloads.qi-hardware.com/people/werner/m1/nor/d8/u24-to-5V.jpg


M1 behaviour after rework
-------------------------

Immediately after the rework, the M1 behaved a little odd. It did
reset and enter standby, but when I tried to get into the BIOS to
run the CRC test, it just stopped (maybe a spurious reset).

I'm not sure what happened there. Later, I checked the voltages,
and they're all good: 4.98 V at the DC jack and 4.94 V at U24 pin
3.

Eventually, it gave in and behaved properly. I then proceeded to
run the usual power-cycling loop.


Testing
-------

I ran the power-cycling test for 4284 cycles. It did not report a
single corruption.

Afterwards, I did a CRC check, which also showed that everything
was in good health (*). Last but not least, I dumped the lock bits
and verified that block 0 was indeed unlocked.

This means that the test seems to be valid. If we assume a
previous corruption probability of 1/500 per cycle, the
probability of passing 4284 cycles without hitting a single
corruption would be about 0.02%.

(*) In case you're checking my log [4]: the rescue BIOS failed the
    CRC check. I think it's the MAC address that causes the CRC to
    fail. I never bothered to fix this, so that failure is normal
    and expected.


Conclusion
----------

It seems that changing the reset circuit such that it always
resets FPGA and NOR when power is ramping down does reduce the
rate of NOR corruptions substantially and may even eliminate the
problem entirely.

The instabilities observed immediately after the rework need
further examination. They may have been caused by residues of the
rework (e.g., flux that hasn't dried completely), but another
possible explanation would be short voltage drops on the 5 V rail
during load changes.

We may also consider using a reset chip with a lower threshold
voltage. E.g., the APX803-40SAG-7 with a nominal threshold of
4.0 V should still give the 3.3 V regulator [5] enough room to do
its work, while being less sensitive to small upsets of the 5 V
supply.


What's next
-----------

I'll play with my M1 in "regular use" for a bit and watch for
unexplained resets/hangs/etc.

After that, a longer test run should provide more certainty that
the corruption is really gone. The probability for that increases
roughly exponentially with the number of cycles, and each 5-6
hours add a factor of ten. So a couple of days should be
sufficient.

Last but not least, this needs testing with the supply voltage at
its limits, e.g., the 4.75 V to 5.25 V allowed for a USB host.


[1] http://www.ait-ic.com/uploads//2009-10/21/_1256089836_7ol2c.pdf
[2] http://www.diodes.com/datasheets/APX803.pdf
[3] http://search.digikey.com/us/en/products/PC94/PC94-ND/354417
[4] http://downloads.qi-hardware.com/people/werner/m1/nor/d8/raw.tar.bz2
[5] http://www.national.com/profile/snip.cgi/openDS=LP38690

- Werner
_______________________________________________
http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org
IRC: #milkymist@Freenode

Reply via email to