TTM API / functionality fixes (Was Re: Xorg 7.4 release plan)

Thomas Hellström Fri, 29 Feb 2008 02:09:24 -0800

Eric Anholt wrote:
> On Thu, 2008-02-28 at 10:08 +0100, Thomas Hellström wrote:
>   
>> Eric Anholt wrote:
>>     
>>> On Thu, 2008-02-28 at 06:08 +1000, Dave Airlie wrote:
>>>   
>>>       
>>>>>  I wasn't planning on a Mesa 7.1 (trunk code) release for a while, but I
>>>>>  could finish up 7.0.3 at any moment.  I have to admit that I haven't
>>>>>  actually tested Mesa 7.0.3 with current X code in quite a while though.
>>>>>
>>>>>  Before Mesa 7.1 I'd like to see a new, official DRM release.  Otherwise,
>>>>>  it's hard to identify a snapshot of DRM that works with Mesa.  I know I
>>>>>  always have trouble with DRM versioning otherwise.
>>>>>
>>>>>  Is there any kind of roadmap for a new DRM release?
>>>>>       
>>>>>           
>>>> When TTM hits the kernel, I'll release a libdrm to work with that and
>>>> solidify the API,
>>>>
>>>> however people keep finding apparently valid reasons to pick holes in
>>>> the TTM API, however I haven't seen the discussion brought up in the
>>>> few weeks since.
>>>>     
>>>>         
>>> http://cgit.freedesktop.org/~anholt/drm/log/?h=drm-ttm-cleanup-2
>>>
>>> has some I believe obvious cleanups to the API, removing many sharp
>>> edges.  At that point the BO parts of the API are more or less tolerable
>>> to me.  The fencing code I don't understand and am very scared by still,
>>> but most of it has left the user <-> kernel API at least.
>>>   
>>>       
>> Some important comments about the API changes, starting from below.
>> Remove DRM_BO_FLAG_FORCE_MAPPABLE, Yes that can go away.
>>
>> Remove DRM_BO_HINT_WAIT_LAZY. No. This flag is intended for polling only 
>> hardware, and has no use at all in the intel driver once the sync 
>> flushes are gone. The fact that you ever saw a difference with this flag 
>> is that there was a bug in the execbuf code that caused you to hit a 
>> polling path in the fence wait mechanism.
>>
>> Ignore DRM_FENCE_FLAG_WAIT_LAZY. NO. Same as above.
>>     
>
> OK.  We should clarify this in the ioctl descriptions so that people
> with sane hardware know that the flags are ignored.
>   
Indeed. The lack of documentation is disturbing and should be fixed asap.
>   
>> Remove unused DRM_FENCE_FLAG_WAIT_IGNORE_SIGNALS. Yes that's OK.
>>
>> Remove DRM_FENCE_FLAG_NO_USER No. Used by the Poulsbo X server EXA 
>> implementation and is quite valuable for small composite operations.
>>
>> Remove DRM_BO_FLAG_CACHED_MAPPED and make that a default behaviour. 
>> No!!! We can't do that!!!
>> DRM_BO_FLAG_CACHED_MAPPED is creating an invalid physical page aliasing, 
>> the details of which are thoroughly explained  here
>>     
>
> I may have said it wrong: Make DRM_BO_FLAG_CACHED_MAPPED the default
> behavior if the platform can support it.  The point is that it should
> not be userland interface -- if the kernel can manage it, then just do
> it.  Otherwise, don't.  I'd rather see us disable the performance hack
> for now than leave a go-faster switch in the interface.
>
> Going back over the commit, I didn't make the better behavior
> conditional on the platform being able to do it.  Oops, I need to fix
> that.
>
>   
Yes, hmm, as I see it there are three performance problems that 
DRM_BO_FLAG_CACHED_MAPPED attempts to address:


1) The buffer creation latency due to global_flush_tlb(). This can be 
worked around with buffer /page caching in a number of ways (below) and 
once the wbinvd() is gone from the main kernel it won't be such a huge 
problem anymore.
a) kernel pool of uncached / unmapped (highmem-like) pages. (Not likely 
to occur anytime soon)
b) A pre-bound region of VRAM-like AGP memory for batch-buffers and 
friends. Easy to set up ands avoids flushing issues altogether.
c) User-space bo-caching and reuse.
d) User-space buffer pools.

TG is heading down the d) path since it also fixes the texture 
granularity problem.

2) Relocation application. KeithPs presumed_offset stuff has to a great 
extent fixed this problem. I think the kmap_atomic_prot_pfn() stuff just 
added will take care of the rest, and I hope the mm kernel guys will 
understand the problem and accept the kmap_atomic_prot_pfn() in. I'm 
working on a patch that will do post-validation only relocations this way.

3) Streaming reads from GPU to CPU. Use cache-coherent buffers if 
available, otherwise SGDMA. I'm not sure (due to prefetching) that 
DRM_BO_FLAG_CACHED_MAPPED addresses this issue correctly.

So from my perspective I'd like to keep the default behavior, 
particularly as we're using d) to address problem 1), and if I 
understand it correctly, Intel is heading down c).

In the long run I'd like to see DRM_BO_FLAG_CACHED_MAPPED disappear, and 
us fix whatever's in the way for you to implement c). If we need to 
address this before a kernel inclusion, is there a way we can have that 
as a driver-specific flag? That would mean adding a driver-specific flag 
preprocessing callback.

>> http://marc.info/?l=linux-kernel&m=102376926732464&w=2
>>
>> And this resulted in the change_page_attr() and the dreaded 
>> global_flush_tlb() kernel calls. From what I understand it might be OK 
>> for streaming writes to the GPU (like batch-buffers) but how would you 
>> stop a CPU from prefetching invalid data from a buffer while you're 
>> writing to it from the GPU? And even write it back, overwriting what the 
>> GPU just wrote?
>> This would break anything trying to use TTM in a consistent way.
>>     
>
> As far as we know, Intel CPUs are not affected by the AMD limitation
> that read-only speculation may result in later writeback, so what we do
> works out.  It does look like we're not flushing CPU cache at map time
> (bo_map_ioctl -> buffer_object_map -> bo_wait, bo_evict_cached ->
> bo_evict -> move_mem), which is wrong.
>
> Note that in the current implementation, when we map the buffer again,
> we unmap it out of the hardware.  It would also be nice to not unmap it
> from the hardware and leave the GART mapping as-is, and just flush the
> cache again when validating.  The 3D driver basically never hits this
> path at the moment, but the X server certainly would (sadly), and we may
> have the 3D driver doing this if we do userland buffer reuse.
>   
Yes, leaving the GART mapping as-is should probably work fine.
My concern is a case similar to where you're doing rendering and then 
needs to do a software fallback.
You'll map the destination buffer but have no way of knowing whether the 
CPU has already speculatively prefetched invalid data into the cached 
kernel mapping. I guess, in that case, it'll be propagated into the 
user-space mapping as well?

/Thomas

>   
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> ------------------------------------------------------------------------
>
> --
> _______________________________________________
> Dri-devel mailing list
> Dri-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dri-devel
>   




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

TTM API / functionality fixes (Was Re: Xorg 7.4 release plan)

Reply via email to