I ran the test loop for 10516 cycles or a bit more than 55 hours this time. The objective was to gather more data on corruption of the "standby" partition. I locked the "read-only" partitions after "standby" with http://projects.qi-hardware.com/index.php/p/wernermisc/source/tree/master/m1rc3/norruption/2/lockmost
The test produced a total of 22 standby failures. They occurred in the following cycles: 369 786 1730 1844 2356 2565 3703 3760 4356 5500 5725 6091 6390 6811 6953 7022 7877 8186 9744 9752 9980 10218 This is the probability distribution for the number of cycles between corruptions: http://downloads.qi-hardware.com/people/werner/m1/nor/d1/dist.png The distribution is basically the same as in the previous test round. Below is a compact representation of where corruptions happened and what form they had: Address | Corruption pattern _ = 0, 1 = 1 | 0 = 1->0, 0 = 0 unchanged, 1 = 1 unchanged ------------------------------------------------------------------------ 00000 ____________________ | 00001011 01000000 | d1/2356-corrupt.bin | 00000000 00000000 | d1/4356-corrupt.bin | 00000000 00000100 | d1/7877-corrupt.bin | 10000110 01000000 | d1/8186-corrupt.bin 0000a ________________1_1_ | 11100110 00001100 | d1/9744-corrupt.bin 0000c ________________11__ | 00000000 00010000 | d1/2356-corrupt.bin 00010 _______________1____ | _1_1_1_1 1__01__1 | d1/1730-corrupt.bin 1/1 | _0_0_0_0 0__00__0 | d1/9980-corrupt.bin 1/1 00020 ______________1_____ | 0_0000__ ________ | d1/5500-corrupt.bin 1/2 | 0_0000__ ________ | d1/7877-corrupt.bin 1/1 00040 _____________1______ | _____0__ ________ | d1/3703-corrupt.bin 1/2 | _____0__ ________ | d1/5500-corrupt.bin 2/2 00050 _____________1_1____ | _____0__ ________ | d1/2356-corrupt.bin 1/2 00066 _____________11__11_ | 0___10__ 1____111 | d1/8186-corrupt.bin 1/1 00082 ____________1_____1_ | _0__11__ 1_____00 | d1/3703-corrupt.bin 2/2 | _0__00__ 0_____11 | d1/3760-corrupt.bin 1/1 | _0__00__ 0_____01 | d1/9752-corrupt.bin 1/1 00086 ____________1____11_ | _0__11__ 0____000 | d1/6390-corrupt.bin 1/1 000a0 ____________1_1_____ | ________ 0_______ | d1/6091-corrupt.bin 1/1 00310 __________11___1____ | ________ __000___ | d1/369-corrupt.bin 1/1 00480 _________1__1_______ | ________ 0_0_0___ | d1/10218-corrupt.bin 1/1 0049e _________1__1__1111_ | ________ _0______ | d1/4356-corrupt.bin 1/1 00840 ________1____1______ | ________ __0_0___ | d1/786-corrupt.bin 1/1 00850 ________1____1_1____ | ________ 1_0_0___ | d1/7022-corrupt.bin 1/1 00862 ________1____11___1_ | 00__00__ __0__00_ | d1/6811-corrupt.bin 1/1 00c10 ________11_____1____ | ________ __0_____ | d1/2356-corrupt.bin 2/2 018d0 _______11___11_1____ | ________ 0100____ | d1/1844-corrupt.bin 1/1 03880 ______111___1_______ | ________ 1___0___ | d1/2565-corrupt.bin 1/1 03ed0 ______11111_11_1____ | _____0__ ________ | d1/5725-corrupt.bin 1/2 04402 _____1___1________1_ | 01010101 00010000 | d1/5725-corrupt.bin 2/2 20080 __1_________1_______ | 01__11__ __1__10_ | d1/9744-corrupt.bin 1/1 200e0 __1_________111_____ | 11__11__ __0__00_ | d1/6953-corrupt.bin 1/1 Corruptions at addresses below 0x10 don't affect the boot process. (Or at least not all of them do.) I don't see any surprising pattern in the above. The general trend towards having more zeroes than ones could have many causes and does not point to any specific underlying mechanism. One new insight is that multiple corruptions (at least up to two) can occur within a single cycle. In the previous experiment, they could also have been the result of an accumulation over nearby cycles while I - literally - wasn't watching. Getting about the same average interval between corruptions as in the previous test indicates that the corruption is linked to the number of cycles and not to the overall run time. I also looked for anything unexpected in the correlation of adjacent intervals between corruptions: http://downloads.qi-hardware.com/people/werner/m1/nor/d1/corr.png This distribution looks reasonable, as far as I can tell with so few samples. For comparison, here is a simulation of 100 exponentially- distributed samples with lambda = 1/478: http://downloads.qi-hardware.com/people/werner/m1/nor/d1/corr-sim.png The various scripts used for this experiment live here: http://projects.qi-hardware.com/index.php/p/wernermisc/source/tree/master/m1rc3/norruption/2/ A tarball of raw results (console log and standby partition dumps) is here: http://downloads.qi-hardware.com/people/werner/m1/nor/d1/raw.tar.bz2 Next: a subtle gotcha. - Werner _______________________________________________ http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org IRC: #milkymist@Freenode
