Hi,

I purchased an M1 a couple of weeks ago and have been very interested in the 
progress of this M1 development for over a year now and have joined this 
mailing list a week or so ago.

Werner, your investigation looks extremely thorough.

Have you looked into using a voltage supervisor to connect to VPEN pin rather 
than the 3.3V VCC?

Looking through the datasheet 319942_J3_65_256M_MLC_DS.pdf
A quote from Page 40, Table 20, Note 3. 
"Block erases, programming, and lock-bit configurations are inhibited when VPEN 
≤ VPENLK, and not guaranteed in the range between VPENLK (max) and VPENH (min), 
and above VPENH (max)."
Where VPENLK(max) = 2.2V and VPENH(min) = 2.7V.

Cheers,
Ed.



> Date: Tue, 25 Oct 2011 08:52:19 -0300
> From: [email protected]
> To: [email protected]
> Subject: [Milkymist-devel] Tales from the dungeons of NORia: the WE# rework
> 
> Executive summary:
> - adding a pull-up to WE# works but doesn't reduce NOR corruption
> - tests confirm that block locking does protect the respective
>   blocks from getting corrupted
> - next: CE0 pull-up
> 
> 
> The rework
> ----------
> 
> As promised in
> http://lists.milkymist.org/pipermail/devel-milkymist.org/2011-October/001940.html
> I added a 4.7 kOhm pull-up to the NOR's write enable signal and
> then ran the power-cycling tests. The rework looks like this:
> 
> http://downloads.qi-hardware.com/people/werner/m1/nor/d2/nor-we.jpg
> 
> This was a bit tricker than it looks because the wire has a
> tendency of twisting during soldering, but with enough patience
> and flux with a sufficiently low evaporation rate, also this
> can be mastered.
> 
> 
> First results
> -------------
> 
> The first run of 6159 cycles still produced numerous corruptions.
> It looked as if the pull-up had reduced their frequency a little,
> but this later turned out to be incorrect:
> 
> http://downloads.qi-hardware.com/people/werner/m1/nor/d2/dist.png
> 
> When doing more testing, I then had a string of X server hangs
> (not caused by the testing), that yielded unusably short runs.
> Finally, I had one that looked normal for a while but then went
> on for more than 20'000 cycles without a single (fatal)
> corruption of the standby partition.
> 
> Eventually, the Flickernoise partition was corrupted, preventing
> the M1 from booting, and I then stopped the test and looked for
> an explanation for this unexpectedly good result.
> 
> 
> Invulnerability debunked
> ------------------------
> 
> When I analyzed what had happened, I found that the first block
> for some reason got locked. If the theory is true that undefined
> bus states during power-down are the root cause of all our NOR
> troubles, then this would mean that one such event has actually
> generated the Block Lock command sequence.
> 
> Such an event may be - very rough estimate - about 1/200 times as
> likely as a random bus state producing a write command with a
> data pattern that clears bits.
> 
> It may also be possible for a Block Unlock command to be
> generated, which - if executed - would unlock the entire device.
> However, given that erasing is a very slow operation, it may well
> be the case that the chip shuts down before such a command can
> produce much damage.
> 
> 
> More extensive results
> ----------------------
> 
> That 20'000 cycles run had me confused for a while, but then I
> finally got a long successful run without unexpected problems.
> This one lasted for 14687 cycles and 33 standby corruptions, and
> ended with the (unprotected) main Flickernoise partition taking a
> hit.
> 
> There's the graph to prove it:
> 
> http://downloads.qi-hardware.com/people/werner/m1/nor/d6/dist.png
> 
> The measured rate of 1/445 is close enough to the 1/478 I got
> before the rework that they can be considered equivalent. In
> other words, the rework had no effect on the rate at which NOR
> corruption occurs.
> 
> The correlation of adjacent intervals doesn't show anything
> suspicious either:
> 
> http://downloads.qi-hardware.com/people/werner/m1/nor/d6/corr.png
> 
> The pattern analysis yields this:
> 
> 00000 ____________________ | 00000000 00000000 | d6/10531-corrupt.bin
>                            | 00000000 00000000 | d6/13288-corrupt.bin
>                            | 00000000 00000000 | d6/14686-corrupt.bin
>                            | 11001101 01000000 | d6/2209-corrupt.bin
>                            | 00000000 00000000 | d6/4292-corrupt.bin
>                            | 00000000 00000000 | d6/4389-corrupt.bin
>                            | 00000000 00000000 | d6/4492-corrupt.bin
>                            | 10011011 11110000 | d6/6091-corrupt.bin
>                            | 10101010 00001011 | d6/7700-corrupt.bin
>                            | 00000000 00000000 | d6/8332-corrupt.bin
>                            | 00000000 00000000 | d6/9423-corrupt.bin
> 00002 __________________1_ | 00000010 10111101 | d6/2209-corrupt.bin
>                            | 00000000 00000000 | d6/7700-corrupt.bin
> 00004 _________________1__ | 00000000 00000000 | d6/13288-corrupt.bin
>                            | 00000000 00000000 | d6/14505-corrupt.bin
> 00014 _______________1_1__ | ____00__ 0____0_0 | d6/14505-corrupt.bin 1/2
> 00020 ______________1_____ | 0_0001__ ________ | d6/14517-corrupt.bin 1/1
>                            | 0_0000__ ________ | d6/3187-corrupt.bin 1/1
>                            | 1_1001__ ________ | d6/9423-corrupt.bin 1/1
> 00040 _____________1______ | _____0__ ________ | d6/13288-corrupt.bin 1/2
> 00050 _____________1_1____ | _____0__ ________ | d6/5320-corrupt.bin 1/2
> 00082 ____________1_____1_ | _0__00__ 0_____00 | d6/4094-corrupt.bin 1/1
> 00086 ____________1____11_ | _0__00__ 0____111 | d6/11961-corrupt.bin 1/1
> 0008a ____________1___1_1_ | 00__10__ 0____0_0 | d6/4492-corrupt.bin 1/1
> 000a0 ____________1_1_____ | ________ 0_______ | d6/319-corrupt.bin 1/1
> 000a2 ____________1_1___1_ | ____1_1_ _____00_ | d6/6528-corrupt.bin 1/1
> 00152 ___________1_1_1__1_ | 00__10__ __0__00_ | d6/11690-corrupt.bin 1/1
> 0017e ___________1_111111_ | ________ 0_______ | d6/4292-corrupt.bin 1/1
> 00180 ___________11_______ | ________ _0______ | d6/6313-corrupt.bin 1/1
> 001d0 ___________111_1____ | ________ 000_____ | d6/5732-corrupt.bin 1/1
> 00202 __________1_______1_ | 00__00__ __0__00_ | d6/10722-corrupt.bin 1/1
> 00440 _________1___1______ | ________ 0___0___ | d6/11565-corrupt.bin 1/1
>                            | ________ 0___0___ | d6/9622-corrupt.bin 1/1
> 00800 ________1___________ | ________ ____0___ | d6/10531-corrupt.bin 1/1
> 0080e ________1_______111_ | ________ __0_____ | d6/13288-corrupt.bin 2/2
> 00830 ________1_____11____ | ________ 00__0___ | d6/8332-corrupt.bin 1/1
> 00840 ________1____1______ | ________ __0_0___ | d6/11745-corrupt.bin 1/1
> 00880 ________1___1_______ | ________ ___00___ | d6/3531-corrupt.bin 1/1
> 008a2 ________1___1_1___1_ | 11__01__ __0__10_ | d6/4389-corrupt.bin 1/1
> 008f0 ________1___1111____ | ________ _00_____ | d6/14505-corrupt.bin 2/2
> 009ec ________1__1111_11__ | ____10__ _1___0__ | d6/5965-corrupt.bin 1/1
> 00c20 ________11____1_____ | ________ 0_0_____ | d6/3120-corrupt.bin 1/1
> 01062 _______1_____11___1_ | 00__00__ __0__00_ | d6/14686-corrupt.bin 1/1
> 01200 _______1__1_________ | ________ 0_0_0___ | d6/13807-corrupt.bin 1/1
> 018c0 _______11___11______ | ________ _001____ | d6/6091-corrupt.bin 1/1
> 01942 _______11__1_1____1_ | 00__00__ __0__00_ | d6/11608-corrupt.bin 1/1
> 02442 ______1__1___1____1_ | 00__00__ __0__00_ | d6/2209-corrupt.bin 1/1
> 02832 ______1_1_____11__1_ | 00__00__ __0__00_ | d6/14206-corrupt.bin 1/1
> 02aa0 ______1_1_1_1_1_____ | ________ 0_0_____ | d6/5320-corrupt.bin 2/2
> 02ffe ______1_11111111111_ | ________ _0000___ | d6/7700-corrupt.bin 1/1
> 0409e _____1______1__1111_ | ________ __000___ | d6/2678-corrupt.bin 1/1
> 
> Also this looks similar to the previous result. There were fewer
> corruptions that left a 1 bit intact somewhere (indicated by "1"
> in the pattern data field), though.
> 
> The test did not reveal any damage to locked partitions, further
> strengthening our hypthesis that locking does indeed avert NOR
> corruption.
> 
> 
> Conclusion
> ----------
> 
> The bad news is that the WE# pull-up didn't help to prevent NOR
> corruption.
> 
> The good news is that it didn't introduce new problems. But we
> wouldn't have expected such things anyway.
> 
> Furthermore, it looks as if locking partitions does indeed
> protect them against NOR corruption, or at least makes this
> corruption so unlikely that an M1 will have died of other causes
> long before such corruption would happen.
> 
> 
> What's next
> -----------
> 
> I'll now try to add a pull-up to FLASH_CE_N/CE0 as well, and see
> how things go.
> 
> - Werner
> _______________________________________________
> http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org
> IRC: #milkymist@Freenode
                                          
_______________________________________________
http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org
IRC: #milkymist@Freenode

Reply via email to