On Mon, 2003-02-03 at 18:09, Benjamin Herrenschmidt wrote: > On Mon, 2003-02-03 at 17:05, Michel Dänzer wrote: > > On Mon, 2003-02-03 at 17:34, Alan Cox wrote: > > > On Mon, 2003-02-03 at 15:02, Keith Whitwell wrote: > > > > > > > > > > -#define COMMIT_RING() do { \ > > > > > - RADEON_WRITE( RADEON_CP_RB_WPTR, dev_priv->ring.tail ); \ > > > > > +#define COMMIT_RING() do { \ > > > > > + /* read from PCI bus to ensure correct posting */ \ > > > > > + RADEON_READ( RADEON_CP_RB_WPTR ); \ > > > > > + RADEON_WRITE( RADEON_CP_RB_WPTR, dev_priv->ring.tail ); \ > > > > > + RADEON_READ( RADEON_CP_RB_WPTR ); \ > > > > > } while (0) > > > > > > > > Ouch. Put a conditional around that at least, so that not everybody suffers... > > > > > > PCI posting applies to all platforms. However I'm trying to understand what this > > > is trying to do. The final read has an effect in that it ensures that the WPTR is > > > written not left posted for an undefined time. What does the previous one >achieve. > > > Is there some kind of synchronization requirement against the GART/main memory ? > > > > That's my understanding, we need to make sure the chip reads from the > > ring what we wrote to it. > > Well... You are asking for trouble ;) > > The problem is that the behaviour will be pretty much HW implementation > dependant. > > In the AGP case, the ring is mapped uncacheable. So your card and the > ring are typically on the same memory type from the CPU, that helps. > Though I would still make sure the correct bus path is flushed by doing > that first read from the ring and not from the card. > > In the PCI case, the ring is mapped cacheable in normal memory and you > rely on the PCI cache coherency (snooping). That means that you have a > new problem which is to synchronize writes to cacheable memory (the > ring) with write to non cacheable MMIO space (the card). At least on > PPC, I don't think anything but a full sync instruction will acheive > that, so you'd rather add an mb(). And do the read from memory (actually > cache), not the card.
After various tests, it looks like all of this is indeed necessary even with AGP. As an example, the Cube used to crash after a couple x11perf tests at 1x, now it's passed several complete x11perf runs at 4x with fast writes. And there's even more: newer compilers seem to optimize away some of the reads with strict aliasing. I thought I'd steal some code from the kernel to detect if the compiler supports -fno-strict-aliasing, but it looks like it just uses that unconditionally. We probably want to do the same for the DRM at least? AFAIR it's been supported since early 2.95. -- Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel