New ttm, splitting logic

Jerome Glisse Thu, 22 Jan 2009 09:26:06 -0800

Hi Thomas,

I did go through new ttm and i would like to get your feedback on
change i think are worthwhile. Current design enforce driver to use
the ttm memory placement and movement logic. I would like to let that
up to the driver and factor out part which do the low level work,
basically it end up to split :


-mmap buffer to userspace
-kmap buffer
-allocate page
-setup pat or mtrr
-invalidate mapping
-Other things i might just forget about right now :)

>From :
-buffer movement
-buffer validation
-buffer eviction
-...

The motivation behind this are that current design will likely often
end up in stalling GPU each time we are doing buffer validation. This
doesn't sounds like a good plan.

Here is what would happen on radeon if i correctly understand current
ttm movement code (like eviction path when we move a bo from system to
vram, or vram to system). Bo movement will be scheduled in command
stream and CPU will wait until the GPU is done moving once this happen
this likely mean that is nor more scheduled work in the GPU pipeline.

For radeon the idea i did have was to use 2 list :
-GART (could be AGP, PCIE, PCI) bind list
-GART (same) unbind list
GPU will wait on the bind list and CPU will wait on the unbind list.
So when we do validation we find where all BO of the CS would be
placed (could depend on hw and userspace hint) and we accordingly
setup :
(Validation of a command stream buffer produce this some of this
could be empty if there is enough room)
-GART unbind (to make room in gart if necessary)
-GART bind (to bind system memory where we will save BO evicted from
 vram)
-gpu cmd to dma into GART the evicted bo
-GART unbind (to make room in gart if necessary could be
 merged with first unbind)
-GART (bind BO referenced in the CS and which are moved into VRAM)
-gpu cmd to dma into VRAM
-GART unbind (if necessary again could be merged with previous unbind)
-GART bind (remaining BO if any which stay in the gart for cs)
-cs

It was the worst case scenario where we use all the GART area. Looks
complex but the idea is that most of the unbinding and binding could be
scheduled before the GPU even have to wait for it. This scheme also
allow to add in the future some kind of GPU cs scheduler which can try
to minimize BO movement by rescheduling cs. Anyway the assumption here
is that most of the time we will have this situation:
-bind BO referenced by cs1
-GPU dma
-signal to CPU to unbind DMAed bo (GPU keep working)
-cs1
-bind BO referenced by cs2 (GPU is still working on cs1
 but we have enough room)
-signal cs1 is done to CPU and all referenced BO could
 be unbind
-GPU dma
-cs2
...

GPU is always busy and CPU is waken up through irq to bind/unbind to
minimize the chance of GPU having to wait on the CPU binding/unbinding
activities.

Also on some newer GPU and PCIE if there is no iommu (or it could be
ignored) GPU can do the binding unbinding on it's own and so never wait
on the CPU. I need to checkup what iommu design require.


So i would like to isolate ttm function which provide core utilities
(mapping,swapping,allocating page, setting cache, ...) from the logic
which drive BO movement/validation. Do you think it's good plan ?
Do you have any requirement or wish on the way i could do that ?

Note that i don't intend to implement the scheme i just described
in the first shot but rather to stick with the bounded BO movement/
placement/validation scheme as first step. Then i can slowly try
implementing this scheme or any other better scheme one might come
up with :)

Cheers,
Jerome Glisse


------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

New ttm, splitting logic

Reply via email to