Re: New ttm, splitting logic

Jerome Glisse Thu, 22 Jan 2009 12:24:20 -0800

On Thu, 2009-01-22 at 20:54 +0100, Thomas Hellström wrote:
> Jerome Glisse wrote:
> > Hi Thomas,
> >
> > I did go through new ttm and i would like to get your feedback on
> > change i think are worthwhile. Current design enforce driver to use
> > the ttm memory placement and movement logic. I would like to let that
> > up to the driver and factor out part which do the low level work,
> > basically it end up to split : 
> >
> > -mmap buffer to userspace
> > -kmap buffer
> > -allocate page
> > -setup pat or mtrr
> > -invalidate mapping
> > -Other things i might just forget about right now :)
> >
> > >From :
> > -buffer movement
> > -buffer validation
> > -buffer eviction
> > -...
> >
> > The motivation behind this are that current design will likely often
> > end up in stalling GPU each time we are doing buffer validation. This
> > doesn't sounds like a good plan.
> >
> > Here is what would happen on radeon if i correctly understand current
> > ttm movement code (like eviction path when we move a bo from system to
> > vram, or vram to system). Bo movement will be scheduled in command
> > stream and CPU will wait until the GPU is done moving once this happen
> > this likely mean that is nor more scheduled work in the GPU pipeline.
> >
> > For radeon the idea i did have was to use 2 list :
> > -GART (could be AGP, PCIE, PCI) bind list
> > -GART (same) unbind list
> > GPU will wait on the bind list and CPU will wait on the unbind list.
> > So when we do validation we find where all BO of the CS would be
> > placed (could depend on hw and userspace hint) and we accordingly
> > setup :
> > (Validation of a command stream buffer produce this some of this
> > could be empty if there is enough room)
> > -GART unbind (to make room in gart if necessary)
> > -GART bind (to bind system memory where we will save BO evicted from
> >  vram)
> > -gpu cmd to dma into GART the evicted bo
> > -GART unbind (to make room in gart if necessary could be
> >  merged with first unbind)
> > -GART (bind BO referenced in the CS and which are moved into VRAM)
> > -gpu cmd to dma into VRAM
> > -GART unbind (if necessary again could be merged with previous unbind)
> > -GART bind (remaining BO if any which stay in the gart for cs)
> > -cs
> >
> > It was the worst case scenario where we use all the GART area. Looks
> > complex but the idea is that most of the unbinding and binding could be
> > scheduled before the GPU even have to wait for it. This scheme also
> > allow to add in the future some kind of GPU cs scheduler which can try
> > to minimize BO movement by rescheduling cs. Anyway the assumption here
> > is that most of the time we will have this situation:
> > -bind BO referenced by cs1
> > -GPU dma
> > -signal to CPU to unbind DMAed bo (GPU keep working)
> > -cs1
> > -bind BO referenced by cs2 (GPU is still working on cs1
> >  but we have enough room)
> > -signal cs1 is done to CPU and all referenced BO could
> >  be unbind
> > -GPU dma
> > -cs2
> > ...
> >
> > GPU is always busy and CPU is waken up through irq to bind/unbind to
> > minimize the chance of GPU having to wait on the CPU binding/unbinding
> > activities.
> >
> > Also on some newer GPU and PCIE if there is no iommu (or it could be
> > ignored) GPU can do the binding unbinding on it's own and so never wait
> > on the CPU. I need to checkup what iommu design require.
> >
> >
> > So i would like to isolate ttm function which provide core utilities
> > (mapping,swapping,allocating page, setting cache, ...) from the logic
> > which drive BO movement/validation. Do you think it's good plan ?
> > Do you have any requirement or wish on the way i could do that ?
> >
> > Note that i don't intend to implement the scheme i just described
> > in the first shot but rather to stick with the bounded BO movement/
> > placement/validation scheme as first step. Then i can slowly try
> > implementing this scheme or any other better scheme one might come
> > up with :)
> >
> > Cheers,
> > Jerome Glisse
> >
> >   
> Hi, Jerome!
> 
> So, If I understand you correctly you want to make a smaller bo class on 
> which the low level functions operate, (basically just something similar 
> to a struct ttm_tt), so that one can derive from that and implement a 
> more elaborate migration scheme?
> 
> I think that is a good idea, and in that case one could perhaps keep the 
> current implementation as a special case.
> 
> What I'm a little worried about, though, is that there will come a lot 
> of new ideas that will postpone an upstream move again, but at this 
> point I don't see such a splitting doing that, since it will be only 
> kernel internals.
> 
> But, before you start looking at this, remember that the current 
> implementation is using unnecessary synchronization when moving. There 
> is, for example no need to synchronize if the GPU is used to DMA from 
> system to VRAM. That's basically because I've been lazy, and it's a bit 
> hard to debug when buffer data is in flight, and even worse if a 
> pipelined buffer move fails :(.
> 
> Anyway, It's entirely possible to remove the synchronization from the 
> current buffer move logic, and let the driver handle it. The only 
> synchronization points would be bo_move_memcpy and bo_move_tt.
> The driver could then, in its own move functions, do what's necessary to 
> sync.
> 
> /Thomas


I don't want neither to postpone upstream acceptance of this code, this
is why i asked first, i will try to come up with a patch but ttm_tt
struct is exactly what i did have in mind, i haven't looked deeply
but i think i mostly need to isolate few things so most of the function
dealing with ttm_tt doesn't require too much from other part of ttm.

Also i think there are few things that kernel people might pester
with (like using int when the data is a boolean) will also try to
go through the code with a kernel reviewer minds ;)

Will push in my own repo once i got somethings which at least compile,
so you could review & comment.

Cheers,
Jerome


------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
--
_______________________________________________
Dri-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: New ttm, splitting logic

Reply via email to