In my previous posting [1] I briefly described some of the locking mechanisms found in NOR Flash chips.
We can now have a closer look at the locking in the NOR used in M1. We use locking to protect NOR partitions used for bringup and recovery, to prevent spurious corruption from affecting them. (More about the on-going NOR corruption research later.) A NOR partition is simply a number of consecutive blocks. The size and location of partitions is a convention of how the memory is used. The NOR chip doesn't know what partitions are - all it sees are operations on blocks, words, etc. The firmware of M1 itself does not know about locking. This means that it never writes to locked partitions and that partitions it does write to must be unlocked. All the lock management of M1 is thus handled through UrTAG. "Intel" locking --------------- UrTAG assumes that chips that identify themselves as "Intel" NOR are from the P30/P33/G18/etc. families and thus have software locks that can be manipulated on a per-block basis. If a device applies additional write-protection, such as lock-down or disabling write/erase for the whole device, setting up the corresponding hardware signals would be outside the scope of UrJTAG itself. Since this kind of NOR comes out of reset with all blocks locked, it is necessary to unlock them before they can be changed. The programming algorithm of UrTAG is thus for each block B { unlock B; erase B; write new content to B; } So far, so good. Intel J3 differences -------------------- The JS28F256J3F105A [2] NOR memory used in M1 is not the common Intel type but has persistent lock bits instead of volatile ones. The commands for locking and unlocking are the same as for other Intel-type chips, but their effect is different: Command Intel J3 other "Intel" NOR --------------- ----------------------- ----------------------- Block Lock set per-block lock bit set per-block lock bit in NOR in RAM Block Unlock erase entire NOR block clear per-block lock containing lock bits bit in RAM UrTAG does not know of this difference and treats J3 family NOR just as if it was regular "Intel" NOR. What really happens ... ----------------------- The programming loop thus becomes for each block B { unlock the entire device; erase B; write new content to B; } The reflash_m1.sh script that we normally use to initialize M1 NOR used to lock all the read-only partitions after writing them, and then proceeded with the partitions that need no locking. As I've indicated in [3], this simply left the entire device unlocked. The problem has since been fixed, and reflash_m1.sh does the locking at the very end, after the last unlock. A happy ending ? ---------------- Can we now consider this simply an unfortunate misunderstanding of how UrTAG's flash programming algorithm works, maybe document the issue somewhere, and move on ? Well, there is one more issue: Flash wear. Since the lock bits in J3 NOR are probably also implemented as NOR cells, their lifespan, specified in erase cycles, is limited. If updating a block N times, its data cells experience N erase cycles, one for each update. However, with the above algorithm, the number of erase cycles of the cells containing lock bits is one for each block unlocked, a total of N*(W+E), where W is the number of blocks erased and then written in each session, and E is the number of blocks just erased. In the case of M1, a run of reflash_m1.sh --release should yield about W = 70 and E = 105. In other words, the lock bits in M1 wear out up to 175 times faster than necessary. Are we doomed ? --------------- Not quite yet. The J3 NOR is specified to last 100k or more erase cycles. With the way we use the NOR in M1, it would take fairly determined torture-testing to reach this point. It may be possible to reach such wear levels in long-running automated tests, but it's not something a customer unit is likely to be exposed to. However, there may be usage scenarios for UrTAG where the increased wear of lock bits could become an issue. Where's the patch ? ------------------- Unfortunately, I don't have one. Not only because I'm too lazy but also because I'm not quite sure how to best detect that the NOR is of the J3 type. And I don't nearly know the NORs of the world well enough that I could be sure that an algorithm change wouldn't cause trouble elsewhere. [1] http://lists.milkymist.org/pipermail/devel-milkymist.org/2011-October/001978.html [2] http://www.micron.com/get-document/?documentId=6062&file=319942_J3_65_256M_MLC_DS.pdf [3] http://lists.milkymist.org/pipermail/devel-milkymist.org/2011-October/001939.html - Werner _______________________________________________ http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org IRC: #milkymist@Freenode