SlowBCopy, IA64, PCI bus corruption

John Dennis Thu, 14 Aug 2003 13:03:56 -0700

I'm trying to track down a problem on ia64 (Itanium), it seems to
manifest itself only with the Nvidia (nv) driver but I don't think
this is an nv driver issue, rather I think its a generic issue in
SlowBCopy which the nv driver happens to invoke.


Symptom:
--------

The first time the nv driver is used for accelerated drawing the X
server enters an infinite loop that consumes almost all the CPU
cycles. Also, if you scan the pci bus at this point (e.g. with lspci)
you will discover bogus config values on the nv card (all ones,
e.g. ~0).

Cause:
------

The cause of the infinite loop in the nv driver is a sync function
that polls a device register waiting for the engine to go idle. At
this point all pci reads from the nv card on the bus return values
that are all ones (~0). These are not valid values, but I believe this
is defined behavior for PCI bridges after a master abort. The nv sync
function never exits its loop because the register its polling is not
being read correctly because of pci bus problems. This also explains
why scanning the pci bus (e.g. lspci) no longer works when the scan
gets to the nv card, its says 7F is not a valid config value and skips
the card (note all values printed below are all ones).

80:00.0 Class ffff: nVidia Corporation NV25GL [Quadro4 900 XGL] (rev ff)
(prog-if ff)
        !!! Unknown header type 7f
 

Root Cause:
-----------

After much investigation I tracked the problem down to VGA font save
and restore which invokes SlowBCopy. SlowBCopy would appear to a hack
designed to slow down bus transactions, seemingly used only when
accessing VGA data. There are two basic variants of SlowBCopy, on x86
architectures there is an asm version which basically just inserts
extra machine instructions in the loop that copies the data. For
some non-x86 architectures the copy loop includes:

        outb(0x80, 0x00);

I learned via correspondence in the past the purpose of this outb is
no-op, supposedly io port 0x80 is not used for anything and thus the
write to this port does nothing other than introducing a delay.

Sort of fix:
------------

On April 7th Egbert Eich checked in a fix to SlowBcopy.c (rev 1.6)
that introduced an extra outb delay before entering the copy loop,
xf86SlowBcopy now looks like this:

void
xf86SlowBcopy(unsigned char *src, unsigned char *dst, int len)
{
#if defined(__ia64__)
    outb(0x80, 0x00);
#endif
    while(len--)
    {
        *dst++ = *src++;
#if !defined(__sparc__) && !defined(__powerpc__) && !defined(__mips__)
        outb(0x80, 0x00);
#endif
    }
}

The fix Egbert introduced fixed the nv hang we were seeing on HP
ZX2000's and the subsequent PCI Bus corruption (e.g. card only returns
~0 on all PCI reads). I thought we now had a fix for all nv cards on
all HP ZX systems. But my elation was premature. The exact same
symptoms reared its head on HP ZX 6000's even with the above fix. As
long as SlowBCopy for VGA font save/restore is not called things work
fine on the ZX 6000's.

Not surprised SlowBCopy is not robust:
--------------------------------------

For reasons I don't understand (can somebody explain this to me?)
reads/writes to VGA data perform slower than bus transactions. This
would appear to be why SlowBCopy was introduced originally, to slow
down reads and writes to the VGA data and hence either preventing the
data from being corrupted and/or prevent the bus from getting into a
bad state when bus transactions start to timeout.

Now it seems to me that using extra machine instructions (asm version)
or no-op IO is inherently a risky solution to this problem. It would
appear there is some interval of time one must wait for individual VGA
bus transactions complete. The number of extra machine instructions
and/or no-op IO to insert seems to be purely a guess and highly
dependent on the processor and the bus its sitting on. The fact this
works on one class of machines and not another does not surprise me at
all.

So my real questions are:
-------------------------

1) Why are VGA transactions so slow and is there a known timing value?

2) Is the fact that reads from the nv card return as all ones (~0)
due to PCI master abort as a consequence of timing out on a VGA
transaction and is the PCI bridge never recovering from the abort (I
believe its the bridge who is responsible for returning all
ones). And if this is true can/should the bridge be configured not to
stay in this state (it seems to stay in this state until hw reset).

OR

Is it not the bridge that is the culprit for returning all ones but
the card (an nvidia in this instance) that is in some screwed up state
such that it returns all ones till hw reset? 

I think this is important distinction, if this is a PCI bridge
configuration issue we might be able to address it in a more generic
manner. If its a card issue then that suggests a driver specific
solution. 

3) Can we come up with a scheme that introduces a known timing delay
(e.g. usleep) such that we don't have to make arbitrary guesses as to
how much no-op is needed in the loop on a given system?

4) Is my general analysis correct? If not can you help explain where
I'm missing the mark and what the actual issues are?

John

-- 
John Dennis <[EMAIL PROTECTED]>

_______________________________________________
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel

SlowBCopy, IA64, PCI bus corruption

Reply via email to