On Wed, Oct 23, 2002 at 01:01:39AM +0100, Alan Cox wrote:
On Wed, 2002-10-23 at 00:19, Jos� Fonseca wrote:
[...]
I'm not sure what you mean with "cache" above, but the Mach64 has a ring
buffer with all the pending DMA buffers, so there will be DMA transfer
simultaneously with the copy/verify, but with unrelated DMA buffers.
Is your very large mesh a "single DMA buffer" or multiple buffers.

Surely multiple buffers - the DRM has a pool of 16k buffers. But what
actually happens with usual applications (i.e., games), is that the
application only fills less than 4k. So we could even just use a pool of
4k buffers to be shure that the buffer could fit on L1 cache on most
machines (or have it as a parameter).
4k is enough to hold roughly about 146 vertices (including color, and
texture coords) on Mach64. Most objects in virtual [gaming] worlds don't have that many with the _same_ texture, and to change a texture means flush the buffer as is.
Again I stress that this is regarding vertex data only - with texture
data the bandwith is much higher, but fortunately there are no security
concerns in that case.

I hope this has answered your questions. I'm still not sure what should
be the best approach here in detail after reading this thread. There seems to be a consensus regarding verifying on the source and not on the destination, but not whether verify and copy should be done at the same time or in distint steps, which relates to benefit of prefetching and/or uncached-writes (which isn't even clear if are actually a benefit
or not).
Prefetching tends to be a win. What to prefetch is a harder question
normally solved by benchmarking. When the card does DMA access to a
buffer it will suck it from the processor L2 caches. If it only reads
The card only reads.

you should end up with a local copy in cache. If the card writes to the
buffers as it processes them it will actually evict them from the CPU
cache in most cases. In the former case I would expect to want to
prefetch the input data (please trust copy_from_user to do this right,
it doesn't do a good job yet but its the business of that code to do
it). In the latter case I could see the prefetchw of the destination DMA
buffer being more of a win.

But if I call copy_from_user I won't be making the copy and verify in
one pass, as advised below. Should I adapt copy_from_user to do that then?

The reasons for doing the copy and verify in one pass are twofold

-   memory access is slow so even if the data is in L2 we have clock
cycles to fill while copying. For example on x86 copy and checksum is
the same speed as a copy for most cases.

-   If your long series of commands is multiple DMA buffers you can fire
off DMA buffers as you can rather than copying, then scanning. That
reduces latency and also means you are less likely to have data fall out
of L1 cache then be pulled back into it.
Ok. So it seems to be the safest bet.

Uncached writes on PC hardware are almost always a complete loss. You
want the writeback caching so you are writing to the PCI bridge or sdram
in the largest chunk sizes possible.
Oh..

Its also entirely possible on something like a Mach64, where you don't
have very many giant objects that it won't make a blind bit of
difference if you prefetch, validate as you write etc.
I'm aware that many of these questions will only be completely answered
with some benchmarks, but since I don't have a clear understanding of
some of the relavant concepts, these preliminary considerations will
hopefully reduce the number of necessary iterations until the
best/acceptable level of performance is achieved. Thanks.

Jos� Fonseca


-------------------------------------------------------
This sf.net emial is sponsored by: Influence the future of Java(TM) technology. Join the Java Community Process(SM) (JCP(SM)) program now. http://ad.doubleclick.net/clk;4699841;7576301;v?http://www.sun.com/javavote
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to