Re: TTM merging?

Jerome Glisse Wed, 14 May 2008 10:38:29 -0700

On Wed, 14 May 2008 10:21:15 -0700 (PDT)
Keith Whitwell <[EMAIL PROTECTED]> wrote:


> > On Wed, 14 May 2008 16:36:54 +0200
> > Thomas Hellström wrote:
> > 
> > > Jerome Glisse wrote:
> > > I don't agree with you here. EXA is much faster for small composite 
> > > operations and even small fill blits if fallbacks are used. Even to 
> > > write-combined memory, but that of course depends on the hardware. This 
> > > is going to be even more pronounced with acceleration architectures like 
> > > Glucose and similar, that don't have an optimized path for small 
> > > hardware composite operations.
> > > 
> > > My personal feeling is that pwrites are a workaround for a workaround 
> > > for a very bad decision:
> > > 
> > > To avoid user-space allocators on device-mapped memory. This lead to a 
> > > hack to avoid cahing-policy changes which lead to  cache trashing 
> > > problems which put us in the current situation.  How far are we going to 
> > > follow this path before people wake up? What's wrong with the 
> > > performance of good old i915tex which even beats "classic" i915 in many 
> > > cases.
> > > 
> > > Having to go through potentially (and even probably) paged-out memory to 
> > > access buffers to make that are present in VRAM sounds like a very odd 
> > > approach (to say the least) to me. Even if it's a single page and 
> > > implementing per-page dirty checks for domain flushing isn't very 
> > > appealing either.
> > 
> > I don't have number or benchmark to check how fast pread/pwrite path might
> > be in this use so i am just expressing my feeling which happen to just be
> > to avoid vma tlb flush as most as we can. I got the feeling that kernel
> > goes through numerous trick to avoid tlb flushing for a good reason and
> > also i am pretty sure that with number of core keeping growing anythings
> > that need cpu broad synchronization is to be avoided.
> > 
> > Hopefully once i got decent amount of time to do benchmark with gem i will
> > check out my theory. I think simple benchmark can be done on intel hw just
> > return false in EXA prepare access to force use of download from screen,
> > and in download from screen use pread then comparing benchmark of this
> > hacked intel ddx with a normal one should already give some numbers.
> > 
> > > Why should we have to when we can do it right?
> > 
> > Well my point was that mapping vram is not right, i am not saying that
> > i know the truth. It's just a feeling based on my experiment with ttm
> > and on the bar restriction stuff and others consideration of same kind.
> > 
> > > No. Gem can't coop with it. Let's say you have a 512M system with two 1G 
> > > video cards, 4G swap space, and you want to fill both card's videoram 
> > > with render-and-forget textures for whatever purpose.
> > > 
> > > What happens? After you've generated the first say 300M, The system 
> > > mysteriously starts to page, and when, after a a couple of minutes of 
> > > crawling texture upload speeds, you're done, The system is using and 
> > > have written almost 2G of swap. Now, you want to update the textures and 
> > > expect fast texsubimage...
> > > 
> > > So having a backing object that you have to access to get things into 
> > > VRAM is not the way to go.
> > > The correct way to do this is to reserve, but not use swap space. Then 
> > > you can start using it on suspend, provided that the swapping system is 
> > > still up (which is has to be with the current GEM approach anyway). If 
> > > pwrite is used in this case, it must not dirty any backing object pages.
> > > 
> > 
> > For normal desktop i don't expect VRAM amount > RAM amount, people with
> > 1Go VRAM are usually hard gamer with 4G of ram :). Also most object in
> > 3d world are stored in memory, if program are not stupid and trust gl
> > to keep their texture then you just have the usual ram copy and possibly
> > a vram copy, so i don't see any waste in the normal use case. Of course
> > we can always come up with crazy weird setup, but i am more interested
> > in dealing well with average Joe than dealing mostly well with every
> > use case.
> 
> It's always been a big win to go to single-copy texturing.  Textures tend to 
> be large and nobody has so much memory that doubling up on textures has ever 
> been appealing...  And there are obvious use-cases like textured video where 
> only having a single copy is a big performance.
> 
> It certainly makes things easier for the driver to duplicate textures -- 
> which is why all the old DRI drivers did it -- but it doesn't make it 
> right...  And the old DRI drivers also copped out on things like 
> render-to-texture, etc, so whatever gains you make in simplicity by treating 
> VRAM as a cache, some of those will be lost because you'll have to keep track 
> of which one of the two copies of a texture is up-to-date, and you'll still 
> have to preserve (modified) texture contents on eviction, which old DRI never 
> had to.
> 
> Ultimately it boils down to a choice between making your life easier as a 
> developer of the driver and producing a driver that makes most advantage of 
> all the system resources.  
> 
> Nobody can force you to take one path or the other, but it's certainly my 
> intention when considering drivers for VRAM hardware to support 
> single-copy-number textures, and for that reason, I'd be unhappy to see a 
> system adopted that prevented that.
> 
> Keith
> 

I am also for saving memory and i think you can do it in gem here is call chain 
i foresee:
-create buffer
-specific driver ioctl to set buffer hint: asking for object to be in vram
-pwrite texture
>From their pwrite goes through drm driver and through driver specific callback:
-driver see hint vram check for vram space
-if space in vram take it & write object their, allocate backing store object
 but page are not instancied so no RAM is actually use, only swap area.
-if no space in vram well you loose you to normal RAM.

Drawbacks is that it's up to driver to take care of saving vram copy but
i believe this to be driver specific enough to be fine. So in this scheme
you got one copy of the object with swap area (or i am just severly miss
understanding few kernel area which could happen).

Cheers,
Jerome Glisse <[EMAIL PROTECTED]>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: TTM merging?

Reply via email to