Happy weekend, all!  Latest updates on this issue:

Identified and replaced a faulty 4116 DRAM (E204) on my MS11-L.  After this, my 
small hand-rolled standalone diagnostic passes the full 256K.  I'll post my 
diagnostic source over on my blog soon.

After this repair, tried MAINDEC ZQMC, called out as the appropriate diagnostic 
by the MS11-L docs.  This was interesting...  First, it would barely run at all 
unless I disabled parity checking with front panel switch settings.  Second, it 
flagged a bunch of memory locations that weren't reported by my much simpler 
diagnostic (which only does all-ones/all-zeros passes looking for stuck bits at 
this point.)

The MAINDEC memory diagnostic is bulky and complicated, and it takes several 
minutes to re-download it after a power cycle, so it's not exactly convenient 
to use while troubleshooting.  I'll probably be beefing up my smaller 
diagnostic with a few more tests (including parity).

Went ahead and tried both RSTS and Unix again after the above repair, and saw 
the same fault behaviors from both (sadness).  Oh well, not there yet...

So, smokiest gun I have right now is the parity issue.  Could be I still have a 
bad DRAM on my MS11 in one of the parity banks...  I tried enabling trap on 
parity error in the MS11 CSR before running my diagnostic, but it didn't trap, 
even though it did flag parity error(s) in the CSR.  So maybe I *also* have a 
bug I haven't yet addressed in parity handling within CPU.  I realized there is 
a MAINDEC specifically for this (CKBR) which I had previously overlooked. May 
give that a look today.  Also, parity is one significant difference between 
SIMH and my real hardware: SIMH emulates a memory system with no parity 
hardware.

Looking into the parity issue some last night has raised a few questions:

- There is a lot of inconsistent and incomplete information in the 
documentation about memory CSRs.  They appear to come in different flavors 
depending on memory hardware; some of the earlier ones support setting a bit to 
determine whether parity errors will halt or trap the CPU, while some of the 
later ones (like my MS11-L) simply have "enable" and don't distinguish between 
halt and trap.  I'm curious how OS init code sniffs out what memory CSRs there 
are, determines their specific flavors and, in a heterogeneous system, 
determines how much address space is under the auspice of each CSR?  Maybe Paul 
and Noel can comment here wrt. RSTS and Unix respectively?

- The 11/45 prints show a jumper (W1, lower left of sheet UBCB) that looks like 
it would entirely disable Unibus parity error detection if removed.  This was 
an obvious thing to check, but when I pulled and examined my UBC board (and 
also looked over my spare) no such jumper or any associated pads were anywhere 
to be found!  So maybe this was either added/removed from later etches of the 
UBC?  Anybody know more on this?

My UBC has required three separate repairs so far in the course of restoring 
this machine, in order to address various independent issues.  Now we may now 
be coming up on #4...  Based also on the rat's nest of green wires on these 
boards and the frustrated-looking engineer scrawl *all* over this page of the 
prints, the UBC really is the heart of darkness of the KB11-A :-)

  cheers,
    --FritzM.


Reply via email to