Hi,
I purchased an M1 a couple of weeks ago and have been very interested in the progress of this M1 development for over a year now and have joined this mailing list a week or so ago. Werner, your investigation looks extremely thorough. Have you looked into using a voltage supervisor to connect to VPEN pin rather than the 3.3V VCC? Looking through the datasheet 319942_J3_65_256M_MLC_DS.pdf A quote from Page 40, Table 20, Note 3. "Block erases, programming, and lock-bit configurations are inhibited when VPEN ≤ VPENLK, and not guaranteed in the range between VPENLK (max) and VPENH (min), and above VPENH (max)." Where VPENLK(max) = 2.2V and VPENH(min) = 2.7V. Cheers, Ed. > Date: Tue, 25 Oct 2011 08:52:19 -0300 > From: [email protected] > To: [email protected] > Subject: [Milkymist-devel] Tales from the dungeons of NORia: the WE# rework > > Executive summary: > - adding a pull-up to WE# works but doesn't reduce NOR corruption > - tests confirm that block locking does protect the respective > blocks from getting corrupted > - next: CE0 pull-up > > > The rework > ---------- > > As promised in > http://lists.milkymist.org/pipermail/devel-milkymist.org/2011-October/001940.html > I added a 4.7 kOhm pull-up to the NOR's write enable signal and > then ran the power-cycling tests. The rework looks like this: > > http://downloads.qi-hardware.com/people/werner/m1/nor/d2/nor-we.jpg > > This was a bit tricker than it looks because the wire has a > tendency of twisting during soldering, but with enough patience > and flux with a sufficiently low evaporation rate, also this > can be mastered. > > > First results > ------------- > > The first run of 6159 cycles still produced numerous corruptions. > It looked as if the pull-up had reduced their frequency a little, > but this later turned out to be incorrect: > > http://downloads.qi-hardware.com/people/werner/m1/nor/d2/dist.png > > When doing more testing, I then had a string of X server hangs > (not caused by the testing), that yielded unusably short runs. > Finally, I had one that looked normal for a while but then went > on for more than 20'000 cycles without a single (fatal) > corruption of the standby partition. > > Eventually, the Flickernoise partition was corrupted, preventing > the M1 from booting, and I then stopped the test and looked for > an explanation for this unexpectedly good result. > > > Invulnerability debunked > ------------------------ > > When I analyzed what had happened, I found that the first block > for some reason got locked. If the theory is true that undefined > bus states during power-down are the root cause of all our NOR > troubles, then this would mean that one such event has actually > generated the Block Lock command sequence. > > Such an event may be - very rough estimate - about 1/200 times as > likely as a random bus state producing a write command with a > data pattern that clears bits. > > It may also be possible for a Block Unlock command to be > generated, which - if executed - would unlock the entire device. > However, given that erasing is a very slow operation, it may well > be the case that the chip shuts down before such a command can > produce much damage. > > > More extensive results > ---------------------- > > That 20'000 cycles run had me confused for a while, but then I > finally got a long successful run without unexpected problems. > This one lasted for 14687 cycles and 33 standby corruptions, and > ended with the (unprotected) main Flickernoise partition taking a > hit. > > There's the graph to prove it: > > http://downloads.qi-hardware.com/people/werner/m1/nor/d6/dist.png > > The measured rate of 1/445 is close enough to the 1/478 I got > before the rework that they can be considered equivalent. In > other words, the rework had no effect on the rate at which NOR > corruption occurs. > > The correlation of adjacent intervals doesn't show anything > suspicious either: > > http://downloads.qi-hardware.com/people/werner/m1/nor/d6/corr.png > > The pattern analysis yields this: > > 00000 ____________________ | 00000000 00000000 | d6/10531-corrupt.bin > | 00000000 00000000 | d6/13288-corrupt.bin > | 00000000 00000000 | d6/14686-corrupt.bin > | 11001101 01000000 | d6/2209-corrupt.bin > | 00000000 00000000 | d6/4292-corrupt.bin > | 00000000 00000000 | d6/4389-corrupt.bin > | 00000000 00000000 | d6/4492-corrupt.bin > | 10011011 11110000 | d6/6091-corrupt.bin > | 10101010 00001011 | d6/7700-corrupt.bin > | 00000000 00000000 | d6/8332-corrupt.bin > | 00000000 00000000 | d6/9423-corrupt.bin > 00002 __________________1_ | 00000010 10111101 | d6/2209-corrupt.bin > | 00000000 00000000 | d6/7700-corrupt.bin > 00004 _________________1__ | 00000000 00000000 | d6/13288-corrupt.bin > | 00000000 00000000 | d6/14505-corrupt.bin > 00014 _______________1_1__ | ____00__ 0____0_0 | d6/14505-corrupt.bin 1/2 > 00020 ______________1_____ | 0_0001__ ________ | d6/14517-corrupt.bin 1/1 > | 0_0000__ ________ | d6/3187-corrupt.bin 1/1 > | 1_1001__ ________ | d6/9423-corrupt.bin 1/1 > 00040 _____________1______ | _____0__ ________ | d6/13288-corrupt.bin 1/2 > 00050 _____________1_1____ | _____0__ ________ | d6/5320-corrupt.bin 1/2 > 00082 ____________1_____1_ | _0__00__ 0_____00 | d6/4094-corrupt.bin 1/1 > 00086 ____________1____11_ | _0__00__ 0____111 | d6/11961-corrupt.bin 1/1 > 0008a ____________1___1_1_ | 00__10__ 0____0_0 | d6/4492-corrupt.bin 1/1 > 000a0 ____________1_1_____ | ________ 0_______ | d6/319-corrupt.bin 1/1 > 000a2 ____________1_1___1_ | ____1_1_ _____00_ | d6/6528-corrupt.bin 1/1 > 00152 ___________1_1_1__1_ | 00__10__ __0__00_ | d6/11690-corrupt.bin 1/1 > 0017e ___________1_111111_ | ________ 0_______ | d6/4292-corrupt.bin 1/1 > 00180 ___________11_______ | ________ _0______ | d6/6313-corrupt.bin 1/1 > 001d0 ___________111_1____ | ________ 000_____ | d6/5732-corrupt.bin 1/1 > 00202 __________1_______1_ | 00__00__ __0__00_ | d6/10722-corrupt.bin 1/1 > 00440 _________1___1______ | ________ 0___0___ | d6/11565-corrupt.bin 1/1 > | ________ 0___0___ | d6/9622-corrupt.bin 1/1 > 00800 ________1___________ | ________ ____0___ | d6/10531-corrupt.bin 1/1 > 0080e ________1_______111_ | ________ __0_____ | d6/13288-corrupt.bin 2/2 > 00830 ________1_____11____ | ________ 00__0___ | d6/8332-corrupt.bin 1/1 > 00840 ________1____1______ | ________ __0_0___ | d6/11745-corrupt.bin 1/1 > 00880 ________1___1_______ | ________ ___00___ | d6/3531-corrupt.bin 1/1 > 008a2 ________1___1_1___1_ | 11__01__ __0__10_ | d6/4389-corrupt.bin 1/1 > 008f0 ________1___1111____ | ________ _00_____ | d6/14505-corrupt.bin 2/2 > 009ec ________1__1111_11__ | ____10__ _1___0__ | d6/5965-corrupt.bin 1/1 > 00c20 ________11____1_____ | ________ 0_0_____ | d6/3120-corrupt.bin 1/1 > 01062 _______1_____11___1_ | 00__00__ __0__00_ | d6/14686-corrupt.bin 1/1 > 01200 _______1__1_________ | ________ 0_0_0___ | d6/13807-corrupt.bin 1/1 > 018c0 _______11___11______ | ________ _001____ | d6/6091-corrupt.bin 1/1 > 01942 _______11__1_1____1_ | 00__00__ __0__00_ | d6/11608-corrupt.bin 1/1 > 02442 ______1__1___1____1_ | 00__00__ __0__00_ | d6/2209-corrupt.bin 1/1 > 02832 ______1_1_____11__1_ | 00__00__ __0__00_ | d6/14206-corrupt.bin 1/1 > 02aa0 ______1_1_1_1_1_____ | ________ 0_0_____ | d6/5320-corrupt.bin 2/2 > 02ffe ______1_11111111111_ | ________ _0000___ | d6/7700-corrupt.bin 1/1 > 0409e _____1______1__1111_ | ________ __000___ | d6/2678-corrupt.bin 1/1 > > Also this looks similar to the previous result. There were fewer > corruptions that left a 1 bit intact somewhere (indicated by "1" > in the pattern data field), though. > > The test did not reveal any damage to locked partitions, further > strengthening our hypthesis that locking does indeed avert NOR > corruption. > > > Conclusion > ---------- > > The bad news is that the WE# pull-up didn't help to prevent NOR > corruption. > > The good news is that it didn't introduce new problems. But we > wouldn't have expected such things anyway. > > Furthermore, it looks as if locking partitions does indeed > protect them against NOR corruption, or at least makes this > corruption so unlikely that an M1 will have died of other causes > long before such corruption would happen. > > > What's next > ----------- > > I'll now try to add a pull-up to FLASH_CE_N/CE0 as well, and see > how things go. > > - Werner > _______________________________________________ > http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org > IRC: #milkymist@Freenode
_______________________________________________ http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org IRC: #milkymist@Freenode
