Re: Handling transient data

Thomas Hellström Tue, 04 Mar 2008 07:35:18 -0800

Dave Airlie wrote:
> apologies for top posting, but Thomas's email appears to be breaking 
> alpine (html or something encoding)
>   
> The big area where we win with CACHED_MAPPED is pixmaps for 2D operations. 
>
> a) we can't know in advance if we should allocate pixmaps as cached or 
> uncached.
> b) we can't know if we are going to be doing mostly hw or mostly sw 
> rendering with the pixmap.
>
> In this case we end up hitting the migration a lot, I couldn't come up 
> with a solution that worked that wasn't CACHED_MAPPED, unless we had 
> coherent GART.. granted I may not have thought about it enough..
>   
Hmm, Yes this is a tricky case. Doesn't Intel's coherent GART,  
DRM_BO_FLAG_CACHED, work here? I suspect it'd be a bit slow though.


/Thomas


> There ends your reminder that all the world is not a 3D app :), consider 
> my main use case for TTM is EXA and compiz...
>   
> Dave.
>
> On Tue, 4 Mar 2008, Thomas Hellström wrote:
>
>   
>> Keith Packard wrote:
>> On Mon, 2008-03-03 at 19:34 +0100, Thomas Hellström wrote:
>>
>>   
>>   
>>     
>>> 2) Copying buffers in drm. This is to avoid vma creation and pagefaults, 
>>> right? Yes, that could be an improvement *if* kmap_atomic is used to 
>>> provide the kernel mapping. Doing vmap on a whole buffer is probably 
>>> almost as expensive as a user-space mapping, and will waste precious 
>>> vmalloc space. User-space buffer mappings aren't really that expensive, 
>>> and a second map on the same buffer is essentially a no_op, unless you 
>>> are using DRM_BO_FLAG_CACHED_MAPPED.
>>>     
>>>       
>> If you run a kernel with the ability to map all of physical memory,
>> there will always be a kernel mapping for every page of memory. DRM
>> already relies on this, allocating pages below the 900M limit on 32-bit
>> kernels and not using the mapping APIs in a way that would make memory
>> above that limit work.
>>
>> Encouraging people to use 64-bit kernels on larger memory machines can
>> eliminate the kernel mapping cost on machines with > 1GB of memory.
>>
>>   
>>     
>>> 3) Copying buffers through the GATT. I assume you're referring to 
>>> binding the buffer to a pre-mapped region of the GATT and then do the 
>>> copying without setting up a new CPU map? That's certainly possible and 
>>> a good candidate for performing relocations if you can't do kmap_atomic().
>>>     
>>>       
>> The kernel mapping is free; the goal here is to avoid the complexities
>> of non-temporal stores, and eliminate the chipset flush kludge. The
>> question here is whether writes through the GTT in WC mode are slower
>> than writes to regular memory in non-temporal WB mode, followed by a
>> chipset flush operation.
>>
>>   
>>     
>>> However, if you were to reuse buffers in user-space and just use plain 
>>> old  !DRM_BO_FLAG_CACHED none of these would be real issues. Buffers 
>>> will stay bound to the GTT unless they get evicted, and the user-space 
>>> vmas would stay populated. You'd pay a performance price the first time 
>>> a buffer is created and when it is destroyed.
>>>     
>>>       
>> Right now, re-using buffers is hard on our caches -- mapping the buffer
>> to the GPU requires a flush, which cleans the cache lines. When we
>> re-use the buffer, the writes will re-load every line from memory.
>>
>> Re-using the same user-space buffer will hit live cache lines. Those
>> cache lines will then be copied to memory which will never be pulled
>> into the cache. The number of writes to memory is the same, but we
>> eliminate the cache line loads which would otherwise occur as the buffer
>> is filled.
>>   
>>     
> Yes, but it's important to know that these issues depend  on whether you 
> change the kernel mapping to be uncached when binding. You're currently 
> not doing that, so you get caching issues and need the chipset flush. If 
> you were to do that, user-space mappings would be write-combined and you 
> wouldn't have any caching problems either. The big performance problems 
> would be changing the kernel mappings when binding / unbinding, and 
> you'd need to re-use buffers to avoid that problem.  Basically what I'm 
> saying above is that *if* you want to reuse buffers, you can use plain 
> old !DRM_BO_FLAG_CACHED to avoid the caching- and flushing issues.
>
> /Thomas
>
>
>
>
>
>
>  
>
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> --
> _______________________________________________
> Dri-devel mailing list
> Dri-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dri-devel




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Handling transient data

Reply via email to