Re: Handling transient data

Keith Packard Mon, 03 Mar 2008 14:36:41 -0800

On Mon, 2008-03-03 at 19:34 +0100, Thomas Hellström wrote:

> 1) Allocating all pages at once:
> Yes, I think this might improve  performance in some cases. The reason 
> it hasn't been done already is the added complexity needed to keep track 
> of  the different allocation sizes. One optimization that´s already in 
> the pipeline is to page in more that a single page (for example 16) when 
> we hit a pagefault in nopfn().


The kernel overhead is per-allocation, not per-page, so we clearly want
to make larger allocation requests. Doing the allocation at create time
eliminates the page-fault costs. As we *always* allocate every page,
there's no benefit to delaying the allocation via the page fault
mechanism.

> 2) Copying buffers in drm. This is to avoid vma creation and pagefaults, 
> right? Yes, that could be an improvement *if* kmap_atomic is used to 
> provide the kernel mapping. Doing vmap on a whole buffer is probably 
> almost as expensive as a user-space mapping, and will waste precious 
> vmalloc space. User-space buffer mappings aren't really that expensive, 
> and a second map on the same buffer is essentially a no_op, unless you 
> are using DRM_BO_FLAG_CACHED_MAPPED.

If you run a kernel with the ability to map all of physical memory,
there will always be a kernel mapping for every page of memory. DRM
already relies on this, allocating pages below the 900M limit on 32-bit
kernels and not using the mapping APIs in a way that would make memory
above that limit work.

Encouraging people to use 64-bit kernels on larger memory machines can
eliminate the kernel mapping cost on machines with > 1GB of memory.

> 3) Copying buffers through the GATT. I assume you're referring to 
> binding the buffer to a pre-mapped region of the GATT and then do the 
> copying without setting up a new CPU map? That's certainly possible and 
> a good candidate for performing relocations if you can't do kmap_atomic().

The kernel mapping is free; the goal here is to avoid the complexities
of non-temporal stores, and eliminate the chipset flush kludge. The
question here is whether writes through the GTT in WC mode are slower
than writes to regular memory in non-temporal WB mode, followed by a
chipset flush operation.

> However, if you were to reuse buffers in user-space and just use plain 
> old  !DRM_BO_FLAG_CACHED none of these would be real issues. Buffers 
> will stay bound to the GTT unless they get evicted, and the user-space 
> vmas would stay populated. You'd pay a performance price the first time 
> a buffer is created and when it is destroyed.

Right now, re-using buffers is hard on our caches -- mapping the buffer
to the GPU requires a flush, which cleans the cache lines. When we
re-use the buffer, the writes will re-load every line from memory.

Re-using the same user-space buffer will hit live cache lines. Those
cache lines will then be copied to memory which will never be pulled
into the cache. The number of writes to memory is the same, but we
eliminate the cache line loads which would otherwise occur as the buffer
is filled.

> Also, if one is prepared to go a step even further to use user-space 
> buffer pools for these things you're even better off performance-wise. 
> An old i915tex driver patched for the latest  DRM environment  would, 
> for simple apps like gears, only average sligthly above 2 kernel calls 
> per batch-buffer, and that's the execbuffer call itself, a fence 
> unreference and an occasional fence sync.

How buffer objects are managed is a separate issue -- I'm interested in
exploring the effect of various cache and TLB issues here. I want to
focus on keeping the CPU cache full of useful data, and not spend time
flushing cache lines only to reload them a short time later.

-- 
[EMAIL PROTECTED]

signature.asc
Description: This is a digitally signed message part

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Handling transient data

Reply via email to