Re: [Dri-devel] Mach64 dma fixes

José Fonseca Sat, 25 May 2002 17:27:18 -0700

On 2002.05.26 00:49 Linus Torvalds wrote:
> 
> 
> On Sat, 25 May 2002, Frank C. Earl wrote:
> >
> > Linus, if you're still listening in, can you spare us a moment to tell
> us
> > what consequences quickly mapping and unmapping memory reigons into
> userspace
> > has on the system?
> 
> It's reasonably fine on UP, and it often _really_ sucks on SMP.
> 
> On UP, the real cost is not so much the actual TLB invalidate (which
> works
> at a page granularity anyway on any recent CPU), but the fact that you
> need to walk the page tables (cache miss heaven), and you will eventually
> need to fault another page back in (page fault, cache miss, whatever).
> 
> On SMP, especially if the program is threaded (which games often are:
> even if the actual graphics engine is single-threaded, you end up having
> another thread for sound, one possible for AI or input etc), the cost
> goes
> up noticeably thanks to a (synchronous) CPU cross-call for a proper TLB
> invalidate.
> 
> >       We've got a couple of the DRM modules that do that to
> > ensure the driver is secure.  I'm thinking it's a source of some
> performance
> > degredation in the drivers and it may not be good on the memory
> subsytem.
> 
> My gut feel is that especially under SMP, you're actually better off
> copying stuff, especially if we're talking about buffers that are
> "mostly" less than few kB.


The vertex data alone (no textures here) can be several MBs per frame, and 
the number of frames per second can be as high as the card can handle, so 
the total buffer memory must be also big. I don't know if having lots of 
small buffers won't create a overhead due to the overhead with IOCTL and 
buffer submission (well, mostly the IOCTLs, since buffers can be queued by 
the kernel).

Throwing some numbers just to get a rough idea: 2[MB/frame] x 
25[frames/second] / 4[Kb/buffer] = 12800 buffers/second.

I'm not very familiar with these issues but won't this number of ioctls 
per second create a significant overhead here? Or would the benefits of 
having each buffer fit on the cache (facilitating the copy) prevail?

On other extreme we would have e.g., a 2MB buffer costing a single IOCTL + 
unmapping & mapping to user space.

(I know that the most likely is that we will need to benchmark this 
anyway...)

> Basically, if the data can fit in the cache (ie the app has just
> generated
> them, and the data is already in the CPU cache and not big enough to blow
> that cache to kingdom come), copying is almost guaranteed to be a win,
> even on UP.
> 
> (And please do note the cache issues: while a big buffer can often
> improve
> performance, it can equally easily _decrease_ performance by putting more
> cache pressure on the system. You're often better off re-using a smaller
> 8kB buffer many times - and doing most everything out of the cache - than
> trying to use a 1MB buffer and aiming for "perfect scaling").
> 
> DMA'ing directly from user space is most likely advantageous for doing
> things like textures, which are bound to be fairly big anyway. I'd _hope_
> that those don't have security issues (ie they'd be DMA'able as just
> data,
> no command interface), but I don't have any information about the card
> details.

Yes. There are several ways to accomplish blits with this card, and at 
least one of them is secure and efficient.

José Fonseca

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Dri-devel] Mach64 dma fixes

Reply via email to