On Sun, 2005-03-13 at 20:47 -0500, Jon Smirl wrote: > On Mon, 14 Mar 2005 12:05:59 +1100, Benjamin Herrenschmidt > <[EMAIL PROTECTED]> wrote: > > > > > It should be the responsibility of the memory manager. If anything wants > > > to access the memory it would call lock() and when it's done with the > > > memory it calls unlock(). That's exactly how DirectFB's memory manager > > > works. > > > > In an ideal world ... However, since we are planning to move the memory > > manager to the kernel, that would mean a kernel access (syscall, ioctl, > > whatever...) twice per access to AGP memory. Not realistic. > > I'm only suggesting this for the DRM/fbdev stack. Anything else from > user space can use a non-cached mapping.
Then I don't see the point. Especially since the problem I explained would still be there as long as there is a non-cached mapping. > It shouldn't hurt to have a parallel non-cached mapping being used in > conjuction with this protocol. By definition the non-cached mapping > never gets into an inconsistent state. Wrong :) It can badly conflict with the existence of a cached mapping. Re-read my mail that explains the problem carefully. > > The case of the CP ring is easy to deal with by the macros we have there > > already and it would be kernel-kernel. But it would be a hit for a lot > > of other things I suppose. > > The performance trade off is, how long does the invalidate take? If > the CPU has 2MB of unflushed write data the instruction is going to > take a while to finish. In the non-cached scheme this data is flushed > in parallel with us playing with the AGP memory. To flush 2MB takes > something like 2MB / 400Mhz * 64bytes * 2 (DDR) = 20 microseconds but > it may be more like 1 microsecond on average. > > Thinking about this for a while you can't compute which is the better > strategy because everything depends on the workload and how dirty the > cache is. Best thing to do would be to code it up and try it. But I > want to get a dual head radeon driver working first. > > It may also be true that the CP Ring is better left non-cached and > only access to the graphics buffers be done with the caching scheme. Using write-through cache might be an interesting tradeoff > BTW, you can implement super fast texture load/unload using a similar > scheme. Start with the texture in the user space program. Program > wants to upload the texture. Flush CPU cache. Point the GART at the > physical pages allocated to the user holding the texture. Now walk the > user's page table and mark those pages copy on write. Free the memory > the pages the GART was originally pointing at. Reverse the scheme to > get data from the GPU. For small textures it is faster to copy them > but if you are moving 20MB of data this is much faster. > -- Benjamin Herrenschmidt <[EMAIL PROTECTED]> ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel