Happy weekend, all! Latest updates on this issue:
Identified and replaced a faulty 4116 DRAM (E204) on my MS11-L. After this, my
small hand-rolled standalone diagnostic passes the full 256K. I'll post my
diagnostic source over on my blog soon.
After this repair, tried MAINDEC ZQMC, called out as the appropriate diagnostic
by the MS11-L docs. This was interesting... First, it would barely run at all
unless I disabled parity checking with front panel switch settings. Second, it
flagged a bunch of memory locations that weren't reported by my much simpler
diagnostic (which only does all-ones/all-zeros passes looking for stuck bits at
this point.)
The MAINDEC memory diagnostic is bulky and complicated, and it takes several
minutes to re-download it after a power cycle, so it's not exactly convenient
to use while troubleshooting. I'll probably be beefing up my smaller
diagnostic with a few more tests (including parity).
Went ahead and tried both RSTS and Unix again after the above repair, and saw
the same fault behaviors from both (sadness). Oh well, not there yet...
So, smokiest gun I have right now is the parity issue. Could be I still have a
bad DRAM on my MS11 in one of the parity banks... I tried enabling trap on
parity error in the MS11 CSR before running my diagnostic, but it didn't trap,
even though it did flag parity error(s) in the CSR. So maybe I *also* have a
bug I haven't yet addressed in parity handling within CPU. I realized there is
a MAINDEC specifically for this (CKBR) which I had previously overlooked. May
give that a look today. Also, parity is one significant difference between
SIMH and my real hardware: SIMH emulates a memory system with no parity
hardware.
Looking into the parity issue some last night has raised a few questions:
- There is a lot of inconsistent and incomplete information in the
documentation about memory CSRs. They appear to come in different flavors
depending on memory hardware; some of the earlier ones support setting a bit to
determine whether parity errors will halt or trap the CPU, while some of the
later ones (like my MS11-L) simply have "enable" and don't distinguish between
halt and trap. I'm curious how OS init code sniffs out what memory CSRs there
are, determines their specific flavors and, in a heterogeneous system,
determines how much address space is under the auspice of each CSR? Maybe Paul
and Noel can comment here wrt. RSTS and Unix respectively?
- The 11/45 prints show a jumper (W1, lower left of sheet UBCB) that looks like
it would entirely disable Unibus parity error detection if removed. This was
an obvious thing to check, but when I pulled and examined my UBC board (and
also looked over my spare) no such jumper or any associated pads were anywhere
to be found! So maybe this was either added/removed from later etches of the
UBC? Anybody know more on this?
My UBC has required three separate repairs so far in the course of restoring
this machine, in order to address various independent issues. Now we may now
be coming up on #4... Based also on the rat's nest of green wires on these
boards and the frustrated-looking engineer scrawl *all* over this page of the
prints, the UBC really is the heart of darkness of the KB11-A :-)
cheers,
--FritzM.