Eric Anholt wrote: > On Thu, 2008-02-28 at 10:08 +0100, Thomas Hellström wrote: > >> Eric Anholt wrote: >> >>> On Thu, 2008-02-28 at 06:08 +1000, Dave Airlie wrote: >>> >>> >>>>> I wasn't planning on a Mesa 7.1 (trunk code) release for a while, but I >>>>> could finish up 7.0.3 at any moment. I have to admit that I haven't >>>>> actually tested Mesa 7.0.3 with current X code in quite a while though. >>>>> >>>>> Before Mesa 7.1 I'd like to see a new, official DRM release. Otherwise, >>>>> it's hard to identify a snapshot of DRM that works with Mesa. I know I >>>>> always have trouble with DRM versioning otherwise. >>>>> >>>>> Is there any kind of roadmap for a new DRM release? >>>>> >>>>> >>>> When TTM hits the kernel, I'll release a libdrm to work with that and >>>> solidify the API, >>>> >>>> however people keep finding apparently valid reasons to pick holes in >>>> the TTM API, however I haven't seen the discussion brought up in the >>>> few weeks since. >>>> >>>> >>> http://cgit.freedesktop.org/~anholt/drm/log/?h=drm-ttm-cleanup-2 >>> >>> has some I believe obvious cleanups to the API, removing many sharp >>> edges. At that point the BO parts of the API are more or less tolerable >>> to me. The fencing code I don't understand and am very scared by still, >>> but most of it has left the user <-> kernel API at least. >>> >>> >> Some important comments about the API changes, starting from below. >> Remove DRM_BO_FLAG_FORCE_MAPPABLE, Yes that can go away. >> >> Remove DRM_BO_HINT_WAIT_LAZY. No. This flag is intended for polling only >> hardware, and has no use at all in the intel driver once the sync >> flushes are gone. The fact that you ever saw a difference with this flag >> is that there was a bug in the execbuf code that caused you to hit a >> polling path in the fence wait mechanism. >> >> Ignore DRM_FENCE_FLAG_WAIT_LAZY. NO. Same as above. >> > > OK. We should clarify this in the ioctl descriptions so that people > with sane hardware know that the flags are ignored. > Indeed. The lack of documentation is disturbing and should be fixed asap. > >> Remove unused DRM_FENCE_FLAG_WAIT_IGNORE_SIGNALS. Yes that's OK. >> >> Remove DRM_FENCE_FLAG_NO_USER No. Used by the Poulsbo X server EXA >> implementation and is quite valuable for small composite operations. >> >> Remove DRM_BO_FLAG_CACHED_MAPPED and make that a default behaviour. >> No!!! We can't do that!!! >> DRM_BO_FLAG_CACHED_MAPPED is creating an invalid physical page aliasing, >> the details of which are thoroughly explained here >> > > I may have said it wrong: Make DRM_BO_FLAG_CACHED_MAPPED the default > behavior if the platform can support it. The point is that it should > not be userland interface -- if the kernel can manage it, then just do > it. Otherwise, don't. I'd rather see us disable the performance hack > for now than leave a go-faster switch in the interface. > > Going back over the commit, I didn't make the better behavior > conditional on the platform being able to do it. Oops, I need to fix > that. > > Yes, hmm, as I see it there are three performance problems that DRM_BO_FLAG_CACHED_MAPPED attempts to address:
1) The buffer creation latency due to global_flush_tlb(). This can be worked around with buffer /page caching in a number of ways (below) and once the wbinvd() is gone from the main kernel it won't be such a huge problem anymore. a) kernel pool of uncached / unmapped (highmem-like) pages. (Not likely to occur anytime soon) b) A pre-bound region of VRAM-like AGP memory for batch-buffers and friends. Easy to set up ands avoids flushing issues altogether. c) User-space bo-caching and reuse. d) User-space buffer pools. TG is heading down the d) path since it also fixes the texture granularity problem. 2) Relocation application. KeithPs presumed_offset stuff has to a great extent fixed this problem. I think the kmap_atomic_prot_pfn() stuff just added will take care of the rest, and I hope the mm kernel guys will understand the problem and accept the kmap_atomic_prot_pfn() in. I'm working on a patch that will do post-validation only relocations this way. 3) Streaming reads from GPU to CPU. Use cache-coherent buffers if available, otherwise SGDMA. I'm not sure (due to prefetching) that DRM_BO_FLAG_CACHED_MAPPED addresses this issue correctly. So from my perspective I'd like to keep the default behavior, particularly as we're using d) to address problem 1), and if I understand it correctly, Intel is heading down c). In the long run I'd like to see DRM_BO_FLAG_CACHED_MAPPED disappear, and us fix whatever's in the way for you to implement c). If we need to address this before a kernel inclusion, is there a way we can have that as a driver-specific flag? That would mean adding a driver-specific flag preprocessing callback. >> http://marc.info/?l=linux-kernel&m=102376926732464&w=2 >> >> And this resulted in the change_page_attr() and the dreaded >> global_flush_tlb() kernel calls. From what I understand it might be OK >> for streaming writes to the GPU (like batch-buffers) but how would you >> stop a CPU from prefetching invalid data from a buffer while you're >> writing to it from the GPU? And even write it back, overwriting what the >> GPU just wrote? >> This would break anything trying to use TTM in a consistent way. >> > > As far as we know, Intel CPUs are not affected by the AMD limitation > that read-only speculation may result in later writeback, so what we do > works out. It does look like we're not flushing CPU cache at map time > (bo_map_ioctl -> buffer_object_map -> bo_wait, bo_evict_cached -> > bo_evict -> move_mem), which is wrong. > > Note that in the current implementation, when we map the buffer again, > we unmap it out of the hardware. It would also be nice to not unmap it > from the hardware and leave the GART mapping as-is, and just flush the > cache again when validating. The 3D driver basically never hits this > path at the moment, but the X server certainly would (sadly), and we may > have the 3D driver doing this if we do userland buffer reuse. > Yes, leaving the GART mapping as-is should probably work fine. My concern is a case similar to where you're doing rendering and then needs to do a software fallback. You'll map the destination buffer but have no way of knowing whether the CPU has already speculatively prefetched invalid data into the cached kernel mapping. I guess, in that case, it'll be propagated into the user-space mapping as well? /Thomas > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > ------------------------------------------------------------------------ > > -- > _______________________________________________ > Dri-devel mailing list > Dri-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dri-devel > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel