On Mon, 2003-02-03 at 18:09, Benjamin Herrenschmidt wrote:
> On Mon, 2003-02-03 at 17:05, Michel Dänzer wrote:
> > On Mon, 2003-02-03 at 17:34, Alan Cox wrote:
> > > On Mon, 2003-02-03 at 15:02, Keith Whitwell wrote:
> > > > >  
> > > > > -#define COMMIT_RING() do {                                       \
> > > > > -     RADEON_WRITE( RADEON_CP_RB_WPTR, dev_priv->ring.tail );             \
> > > > > +#define COMMIT_RING() do {                                           \
> > > > > +     /* read from PCI bus to ensure correct posting */               \
> > > > > +     RADEON_READ( RADEON_CP_RB_WPTR );                               \
> > > > > +     RADEON_WRITE( RADEON_CP_RB_WPTR, dev_priv->ring.tail );         \
> > > > > +     RADEON_READ( RADEON_CP_RB_WPTR );                               \
> > > > >  } while (0)
> > > > 
> > > > Ouch.  Put a conditional around that at least, so that not everybody suffers...
> > > 
> > > PCI posting applies to all platforms. However I'm trying to understand what this
> > > is trying to do. The final read has an effect in that it ensures that the WPTR is
> > > written not left posted for an undefined time. What does the previous one 
>achieve.
> > > Is there some kind of synchronization requirement against the GART/main memory ?
> > 
> > That's my understanding, we need to make sure the chip reads from the
> > ring what we wrote to it.
> 
> Well... You are asking for trouble ;)
> 
> The problem is that the behaviour will be pretty much HW implementation
> dependant. 
> 
> In the AGP case, the ring is mapped uncacheable. So your card and the
> ring are typically on the same memory type from the CPU, that helps.
> Though I would still make sure the correct bus path is flushed by doing
> that first read from the ring and not from the card.
> 
> In the PCI case, the ring is mapped cacheable in normal memory and you
> rely on the PCI cache coherency (snooping). That means that you have a
> new problem which is to synchronize writes to cacheable memory (the
> ring) with write to non cacheable MMIO space (the card). At least on
> PPC, I don't think anything but a full sync instruction will acheive
> that, so you'd rather add an mb(). And do the read from memory (actually
> cache), not the card.

After various tests, it looks like all of this is indeed necessary even
with AGP. As an example, the Cube used to crash after a couple x11perf
tests at 1x, now it's passed several complete x11perf runs at 4x with
fast writes.

And there's even more: newer compilers seem to optimize away some of the
reads with strict aliasing. I thought I'd steal some code from the
kernel to detect if the compiler supports -fno-strict-aliasing, but it
looks like it just uses that unconditionally. We probably want to do the
same for the DRM at least? AFAIR it's been supported since early 2.95.


-- 
Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer
XFree86 and DRI project member   /  CS student, Free Software enthusiast



-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to