Benjamin Herrenschmidt wrote:
Actually, the TTM memory manager already does this, 
but also changes the caching policy of the linear kernel map.
    

The later is not portable unfortunately, and can have other serious
performance impacts.

Typically, the kernel linear map is mapped using larger page sizes, or
in some cases, even large TLB entries, or separate translation registers
(like BATs). Thus you cannot affect the caching policy of a single 4k
page. Also, on some processors, you can't just break down a single large
page into small pages neither. For example, on desktop PowerPC, entire
segments of 256M can have only one page size. Even x86 might have some
interesting issues here...

  
But this should be the same problem encountered by the agpgart driver?
x86 and x86-64 calls change_page_attr() to take care of this.
On powerpc it is simply a noop. (<asm/agp.h>)

  
Unfortunately this leads to rather costly cache and TLB flushes.
Particularly on SMP.
    

Yup.

  
    

What about a futex-like approach:

A shared are mapped by both kernel and user has locks for the buffers.
When submitting a command involving a buffer, userland tries to lock it.
This is a simple atomic operation in user space. If that fails (the lock
for that buffer is held, possibly by the kernel, or the buffer is
swapped out), them it does an ioctl to the DRM to get access (which
involves sleeping until the buffer can be retreived).

One the operation is complete, the apps can release the locks to buffers
it holds. In fact, if there is a mapping to buffers <-> objects for
cards like nVidia with objects and notifiers, the kernel could
auto-unlock objects when the completion interrupt for them occurs.

Ben.

Currently we take the following approach when the GPU needs access to a buffer:

0) Take the hardware lock.
1) The buffer is validated, and if not present in the GATT, it's flipped in. At this point, idle buffers may be flipped out.
2) The app submits a batch buffer (or in the general case a command sequence). All buffers that are referenced by this command sequence needs to have been validated, and the command sequence should be updated with their new GATT offset.
3) A "fence" is emitted, and associated with all unfenced buffers.
4) The hardware lock is released.
5) When the fence has expired (The GPU is finished with the command sequence), the buffers associated with it may optionally be thrown out.

One problem is that buffers that need to be pinned (_always_ available to the GPU) cannot be thrown out and will thus fragment the aperture- or VRAM space.

Buffers also carry usage- and mapping refcounts. They are not allowed to be validated when mapped, and (except under some circumstances) are not allowed to be mapped while validated. Buffer destruction occurs when the refcount goes to zero.

/Thomas


    
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel
  
      

  

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to