Jerome Glisse wrote: > Hi Thomas, > > I did go through new ttm and i would like to get your feedback on > change i think are worthwhile. Current design enforce driver to use > the ttm memory placement and movement logic. I would like to let that > up to the driver and factor out part which do the low level work, > basically it end up to split : > > -mmap buffer to userspace > -kmap buffer > -allocate page > -setup pat or mtrr > -invalidate mapping > -Other things i might just forget about right now :) > > >From : > -buffer movement > -buffer validation > -buffer eviction > -... > > The motivation behind this are that current design will likely often > end up in stalling GPU each time we are doing buffer validation. This > doesn't sounds like a good plan. > > Here is what would happen on radeon if i correctly understand current > ttm movement code (like eviction path when we move a bo from system to > vram, or vram to system). Bo movement will be scheduled in command > stream and CPU will wait until the GPU is done moving once this happen > this likely mean that is nor more scheduled work in the GPU pipeline. > > For radeon the idea i did have was to use 2 list : > -GART (could be AGP, PCIE, PCI) bind list > -GART (same) unbind list > GPU will wait on the bind list and CPU will wait on the unbind list. > So when we do validation we find where all BO of the CS would be > placed (could depend on hw and userspace hint) and we accordingly > setup : > (Validation of a command stream buffer produce this some of this > could be empty if there is enough room) > -GART unbind (to make room in gart if necessary) > -GART bind (to bind system memory where we will save BO evicted from > vram) > -gpu cmd to dma into GART the evicted bo > -GART unbind (to make room in gart if necessary could be > merged with first unbind) > -GART (bind BO referenced in the CS and which are moved into VRAM) > -gpu cmd to dma into VRAM > -GART unbind (if necessary again could be merged with previous unbind) > -GART bind (remaining BO if any which stay in the gart for cs) > -cs > > It was the worst case scenario where we use all the GART area. Looks > complex but the idea is that most of the unbinding and binding could be > scheduled before the GPU even have to wait for it. This scheme also > allow to add in the future some kind of GPU cs scheduler which can try > to minimize BO movement by rescheduling cs. Anyway the assumption here > is that most of the time we will have this situation: > -bind BO referenced by cs1 > -GPU dma > -signal to CPU to unbind DMAed bo (GPU keep working) > -cs1 > -bind BO referenced by cs2 (GPU is still working on cs1 > but we have enough room) > -signal cs1 is done to CPU and all referenced BO could > be unbind > -GPU dma > -cs2 > ... > > GPU is always busy and CPU is waken up through irq to bind/unbind to > minimize the chance of GPU having to wait on the CPU binding/unbinding > activities. > > Also on some newer GPU and PCIE if there is no iommu (or it could be > ignored) GPU can do the binding unbinding on it's own and so never wait > on the CPU. I need to checkup what iommu design require. > > > So i would like to isolate ttm function which provide core utilities > (mapping,swapping,allocating page, setting cache, ...) from the logic > which drive BO movement/validation. Do you think it's good plan ? > Do you have any requirement or wish on the way i could do that ? > > Note that i don't intend to implement the scheme i just described > in the first shot but rather to stick with the bounded BO movement/ > placement/validation scheme as first step. Then i can slowly try > implementing this scheme or any other better scheme one might come > up with :) > > Cheers, > Jerome Glisse > > Hi, Jerome!
So, If I understand you correctly you want to make a smaller bo class on which the low level functions operate, (basically just something similar to a struct ttm_tt), so that one can derive from that and implement a more elaborate migration scheme? I think that is a good idea, and in that case one could perhaps keep the current implementation as a special case. What I'm a little worried about, though, is that there will come a lot of new ideas that will postpone an upstream move again, but at this point I don't see such a splitting doing that, since it will be only kernel internals. But, before you start looking at this, remember that the current implementation is using unnecessary synchronization when moving. There is, for example no need to synchronize if the GPU is used to DMA from system to VRAM. That's basically because I've been lazy, and it's a bit hard to debug when buffer data is in flight, and even worse if a pipelined buffer move fails :(. Anyway, It's entirely possible to remove the synchronization from the current buffer move logic, and let the driver handle it. The only synchronization points would be bo_move_memcpy and bo_move_tt. The driver could then, in its own move functions, do what's necessary to sync. /Thomas I think that's a good idea. ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel