On Wed, 2002-10-23 at 00:19, Jos� Fonseca wrote:
> >Is it neccessary to copy all the data then DMA it or can you pipeline it
> >so that the DMA is writing out some of the cache while you copy data in
> >and verify it ?
> 
> I'm not sure what you mean with "cache" above, but the Mach64 has a ring
> buffer with all the pending DMA buffers, so there will be DMA transfer
> simultaneously with the copy/verify, but with unrelated DMA buffers.

Is your very large mesh a "single DMA buffer" or multiple buffers.

> I hope this has answered your questions. I'm still not sure what should
> be the best approach here in detail after reading this thread. There 
> seems to be a consensus regarding verifying on the source and not on the 
> destination, but not whether verify and copy should be done at the same 
> time or in distint steps, which relates to benefit of prefetching 
> and/or uncached-writes (which isn't even clear if are actually a benefit
> or not).

Prefetching tends to be a win. What to prefetch is a harder question
normally solved by benchmarking. When the card does DMA access to a
buffer it will suck it from the processor L2 caches. If it only reads
you should end up with a local copy in cache. If the card writes to the
buffers as it processes them it will actually evict them from the CPU
cache in most cases. In the former case I would expect to want to
prefetch the input data (please trust copy_from_user to do this right,
it doesn't do a good job yet but its the business of that code to do
it). In the latter case I could see the prefetchw of the destination DMA
buffer being more of a win.

The reasons for doing the copy and verify in one pass are twofold

-   memory access is slow so even if the data is in L2 we have clock
cycles to fill while copying. For example on x86 copy and checksum is
the same speed as a copy for most cases.

-   If your long series of commands is multiple DMA buffers you can fire
off DMA buffers as you can rather than copying, then scanning. That
reduces latency and also means you are less likely to have data fall out
of L1 cache then be pulled back into it.

Uncached writes on PC hardware are almost always a complete loss. You
want the writeback caching so you are writing to the PCI bridge or sdram
in the largest chunk sizes possible.

Its also entirely possible on something like a Mach64, where you don't
have very many giant objects that it won't make a blind bit of
difference if you prefetch, validate as you write etc



-------------------------------------------------------
This sf.net emial is sponsored by: Influence the future
of Java(TM) technology. Join the Java Community
Process(SM) (JCP(SM)) program now.
http://ad.doubleclick.net/clk;4699841;7576301;v?http://www.sun.com/javavote
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to