[Milkymist-devel] Locking 2/2: paradigm mixup (and UrJTAG)

Werner Almesberger Mon, 24 Oct 2011 12:14:55 -0700

In my previous posting [1] I briefly described some of the
locking mechanisms found in NOR Flash chips.


We can now have a closer look at the locking in the NOR used in
M1. We use locking to protect NOR partitions used for bringup and
recovery, to prevent spurious corruption from affecting them.
(More about the on-going NOR corruption research later.)

A NOR partition is simply a number of consecutive blocks. The
size and location of partitions is a convention of how the memory
is used. The NOR chip doesn't know what partitions are - all it
sees are operations on blocks, words, etc.

The firmware of M1 itself does not know about locking. This means
that it never writes to locked partitions and that partitions it
does write to must be unlocked.

All the lock management of M1 is thus handled through UrTAG.


"Intel" locking
---------------

UrTAG assumes that chips that identify themselves as "Intel" NOR
are from the P30/P33/G18/etc. families and thus have software
locks that can be manipulated on a per-block basis.

If a device applies additional write-protection, such as
lock-down or disabling write/erase for the whole device, setting
up the corresponding hardware signals would be outside the scope
of UrJTAG itself.

Since this kind of NOR comes out of reset with all blocks locked,
it is necessary to unlock them before they can be changed. The
programming algorithm of UrTAG is thus

for each block B {
        unlock B;
        erase B;
        write new content to B;
}

So far, so good.


Intel J3 differences
--------------------

The JS28F256J3F105A [2] NOR memory used in M1 is not the common
Intel type but has persistent lock bits instead of volatile ones.
The commands for locking and unlocking are the same as for other
Intel-type chips, but their effect is different:

Command         Intel J3                other "Intel" NOR
--------------- ----------------------- -----------------------
Block Lock      set per-block lock bit  set per-block lock bit
                in NOR                  in RAM
Block Unlock    erase entire NOR block  clear per-block lock
                containing lock bits    bit in RAM

UrTAG does not know of this difference and treats J3 family NOR
just as if it was regular "Intel" NOR.


What really happens ...
-----------------------

The programming loop thus becomes

for each block B {
        unlock the entire device;
        erase B;
        write new content to B;
}

The reflash_m1.sh script that we normally use to initialize M1
NOR used to lock all the read-only partitions after writing them,
and then proceeded with the partitions that need no locking.

As I've indicated in [3], this simply left the entire device
unlocked. The problem has since been fixed, and reflash_m1.sh
does the locking at the very end, after the last unlock.


A happy ending ?
----------------

Can we now consider this simply an unfortunate misunderstanding
of how UrTAG's flash programming algorithm works, maybe document
the issue somewhere, and move on ?

Well, there is one more issue: Flash wear. Since the lock bits in
J3 NOR are probably also implemented as NOR cells, their
lifespan, specified in erase cycles, is limited.

If updating a block N times, its data cells experience N erase
cycles, one for each update. However, with the above algorithm,
the number of erase cycles of the cells containing lock bits is
one for each block unlocked, a total of N*(W+E), where W is the
number of blocks erased and then written in each session, and E
is the number of blocks just erased.

In the case of M1, a run of reflash_m1.sh --release should yield
about W = 70 and E = 105. In other words, the lock bits in M1
wear out up to 175 times faster than necessary.


Are we doomed ?
---------------

Not quite yet. The J3 NOR is specified to last 100k or more erase
cycles. With the way we use the NOR in M1, it would take fairly
determined torture-testing to reach this point. It may be
possible to reach such wear levels in long-running automated
tests, but it's not something a customer unit is likely to be
exposed to.

However, there may be usage scenarios for UrTAG where the
increased wear of lock bits could become an issue. 


Where's the patch ?
-------------------

Unfortunately, I don't have one. Not only because I'm too lazy
but also because I'm not quite sure how to best detect that the
NOR is of the J3 type. And I don't nearly know the NORs of the
world well enough that I could be sure that an algorithm change
wouldn't cause trouble elsewhere.


[1] 
http://lists.milkymist.org/pipermail/devel-milkymist.org/2011-October/001978.html
[2] 
http://www.micron.com/get-document/?documentId=6062&file=319942_J3_65_256M_MLC_DS.pdf
[3] 
http://lists.milkymist.org/pipermail/devel-milkymist.org/2011-October/001939.html

- Werner
_______________________________________________
http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org
IRC: #milkymist@Freenode

[Milkymist-devel] Locking 2/2: paradigm mixup (and UrJTAG)

Reply via email to