On Wed, Oct 23, 2002 at 01:01:39AM +0100, Alan Cox wrote:
On Wed, 2002-10-23 at 00:19, Jos� Fonseca wrote:
[...]
Surely multiple buffers - the DRM has a pool of 16k buffers. But whatI'm not sure what you mean with "cache" above, but the Mach64 has a ring buffer with all the pending DMA buffers, so there will be DMA transfer simultaneously with the copy/verify, but with unrelated DMA buffers.Is your very large mesh a "single DMA buffer" or multiple buffers.
actually happens with usual applications (i.e., games), is that the
application only fills less than 4k. So we could even just use a pool of
4k buffers to be shure that the buffer could fit on L1 cache on most
machines (or have it as a parameter).
4k is enough to hold roughly about 146 vertices (including color, and
texture coords) on Mach64. Most objects in virtual [gaming] worlds don't have that many with the _same_ texture, and to change a texture means flush the buffer as is.
Again I stress that this is regarding vertex data only - with texture
data the bandwith is much higher, but fortunately there are no security
concerns in that case.
I hope this has answered your questions. I'm still not sure what should
be the best approach here in detail after reading this thread. There seems to be a consensus regarding verifying on the source and not on the destination, but not whether verify and copy should be done at the same time or in distint steps, which relates to benefit of prefetching and/or uncached-writes (which isn't even clear if are actually a benefit
or not).
Prefetching tends to be a win. What to prefetch is a harder question normally solved by benchmarking. When the card does DMA access to a buffer it will suck it from the processor L2 caches. If it only reads
The card only reads.
But if I call copy_from_user I won't be making the copy and verify inyou should end up with a local copy in cache. If the card writes to the buffers as it processes them it will actually evict them from the CPU cache in most cases. In the former case I would expect to want to prefetch the input data (please trust copy_from_user to do this right, it doesn't do a good job yet but its the business of that code to do it). In the latter case I could see the prefetchw of the destination DMA buffer being more of a win.
one pass, as advised below. Should I adapt copy_from_user to do that then?
The reasons for doing the copy and verify in one pass are twofold - memory access is slow so even if the data is in L2 we have clock cycles to fill while copying. For example on x86 copy and checksum is the same speed as a copy for most cases. - If your long series of commands is multiple DMA buffers you can fire off DMA buffers as you can rather than copying, then scanning. That reduces latency and also means you are less likely to have data fall out of L1 cache then be pulled back into it.
Ok. So it seems to be the safest bet.
Uncached writes on PC hardware are almost always a complete loss. You want the writeback caching so you are writing to the PCI bridge or sdram in the largest chunk sizes possible.
Oh..
I'm aware that many of these questions will only be completely answeredIts also entirely possible on something like a Mach64, where you don't have very many giant objects that it won't make a blind bit of difference if you prefetch, validate as you write etc.
with some benchmarks, but since I don't have a clear understanding of
some of the relavant concepts, these preliminary considerations will
hopefully reduce the number of necessary iterations until the
best/acceptable level of performance is achieved. Thanks.
Jos� Fonseca
-------------------------------------------------------
This sf.net emial is sponsored by: Influence the future of Java(TM) technology. Join the Java Community Process(SM) (JCP(SM)) program now. http://ad.doubleclick.net/clk;4699841;7576301;v?http://www.sun.com/javavote
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel
