Thanks, Paul and Noel, for the detailed responses per usual!
> On Jan 20, 2019, at 6:55 AM, Noel Chiappa <[email protected]> wrote:
>
> What is [MAINDEC ZQMC] complaining about?
Looks like a few more flaky bits in a couple of additional banks. For those
reading along who may be unfamiliar with the MS11-L, it is laid out as 8
physical banks, each containing 18 16K x 1 DRAMS (16 data + 2 parity bits per
word. So a flaky bit in a physical bank implicates one particular chip.
> Would it be possible to put [ZQMC] on a disk and boot it from there?
I have thought about that... The most efficient way I think would be to work
up a simple LDA loader that would fit in a boot sector, and load a diagnostic
from contiguous disk starting at the second sector. It would then be easy to
blast down just the boot sector and a single desired diagnostic without imaging
an entire pack.
> One of the first things to add [to custom diagnostic] is to store each
> location's address in it during a set-up pass, and check to see that it's
> still there during the checking pass.
I did this last night, actually. I also added a "random" bits test that uses
the program image itself as a source sequence for words to write/compare.
The good news is that my enhanced diagnostics now detect failures in the same
physical banks and with the same bits as those flagged by the MAINDEC
diagnostic. This was a good lesson learned: all ones / all zeros is definitely
not good enough when checking this sort of thing!
Another thing I found interesting, though, is that the "random" test *also*
found a malfunctioning bit that the address test had missed. So ones/zeros and
address isn't really good enough, either.
I'm technically curious, now, about the failure modes of these sorts of DRAMS.
I guess in addition to stuck bits, there are also potential decode fails (show
up on address test, but not ones/zeros) and some errors that have
history-dependence, perhaps internal latches (show up on random data test, but
not address or ones/zeros.) I'd guess also there might be potential for
crosstalk, noise, and "fading bit" type issues as well? Will have to see after
I make the next round of repairs if there are still additional problems that
the MAINDEC flags that my simplistic diag isn't shaking out.
I've also been somewhat surprised by the level of repair needed on this memory
board. So far, I've seen 6 failed 4116 out of an array of 144 total, so about
a 4% failure rate. Is this typical for vintage 4116, or did somebody leave my
poor MS11 out in a lightning storm? :-)
> Starting the CPU (i.e. 'START' switch) or an INIT instruction will clear
> the 'trap enable' bit in the MS11-L CSR.
D'oh! Yes, thanks; I may very well have mucked that up. I'll give it another
try with a little more care later today.
> Which memory has this [parity halt vs trap] feature?
Hmm, I saw this at least once when researching the variety of CSR formats
yesterday morning; I'll have to see if I can dig it up again today. Might be
just a fastbus thing? It's also hinted in paragraph 7.7.7 of the older KB11-A
maintenance manual (NOT the later edition that covers both KB11-A and KB11-D):
"The semiconductor memory control EHA and EHB (enable halt) flip-flops may be
set under program control to assert SMCB PE HALT L if a parity error is
detected. This input also asserts UBCB PARITY ERR SET L, which set the console
flag and halts the CPU."
This particular text is removed from the later KB11-A,D maintenance manual, and
the description there seems to imply all reported parity conditions trap
directly to 114. But there aren't any details in this section concerning
processor revision/version etc.
The logic design around all this is a bit complicated, and the fact that there
are apparent discrepancies between the texts, available prints, and the actual
M8106 boards I have on hand is not heartening!
> The M8106 board layout drawing (a couple of pages back from UBCB) does show
> W1 -
> upper left corner of the board, next to E84.
Yup. And, surprisingly, neither one of my M8106 has either a jumper or the
indicated pull-up at that location! I'll try to send a pic later. The fact
that W1 exists on the M8119 is interesting; maybe the situation is that the
prints are for later revisions, and my actual M8106 are earlier? My /45 is a
very early one -- serial 154!
cheers,
--FritzM.