Hi Thomas, I did go through new ttm and i would like to get your feedback on change i think are worthwhile. Current design enforce driver to use the ttm memory placement and movement logic. I would like to let that up to the driver and factor out part which do the low level work, basically it end up to split :
-mmap buffer to userspace -kmap buffer -allocate page -setup pat or mtrr -invalidate mapping -Other things i might just forget about right now :) >From : -buffer movement -buffer validation -buffer eviction -... The motivation behind this are that current design will likely often end up in stalling GPU each time we are doing buffer validation. This doesn't sounds like a good plan. Here is what would happen on radeon if i correctly understand current ttm movement code (like eviction path when we move a bo from system to vram, or vram to system). Bo movement will be scheduled in command stream and CPU will wait until the GPU is done moving once this happen this likely mean that is nor more scheduled work in the GPU pipeline. For radeon the idea i did have was to use 2 list : -GART (could be AGP, PCIE, PCI) bind list -GART (same) unbind list GPU will wait on the bind list and CPU will wait on the unbind list. So when we do validation we find where all BO of the CS would be placed (could depend on hw and userspace hint) and we accordingly setup : (Validation of a command stream buffer produce this some of this could be empty if there is enough room) -GART unbind (to make room in gart if necessary) -GART bind (to bind system memory where we will save BO evicted from vram) -gpu cmd to dma into GART the evicted bo -GART unbind (to make room in gart if necessary could be merged with first unbind) -GART (bind BO referenced in the CS and which are moved into VRAM) -gpu cmd to dma into VRAM -GART unbind (if necessary again could be merged with previous unbind) -GART bind (remaining BO if any which stay in the gart for cs) -cs It was the worst case scenario where we use all the GART area. Looks complex but the idea is that most of the unbinding and binding could be scheduled before the GPU even have to wait for it. This scheme also allow to add in the future some kind of GPU cs scheduler which can try to minimize BO movement by rescheduling cs. Anyway the assumption here is that most of the time we will have this situation: -bind BO referenced by cs1 -GPU dma -signal to CPU to unbind DMAed bo (GPU keep working) -cs1 -bind BO referenced by cs2 (GPU is still working on cs1 but we have enough room) -signal cs1 is done to CPU and all referenced BO could be unbind -GPU dma -cs2 ... GPU is always busy and CPU is waken up through irq to bind/unbind to minimize the chance of GPU having to wait on the CPU binding/unbinding activities. Also on some newer GPU and PCIE if there is no iommu (or it could be ignored) GPU can do the binding unbinding on it's own and so never wait on the CPU. I need to checkup what iommu design require. So i would like to isolate ttm function which provide core utilities (mapping,swapping,allocating page, setting cache, ...) from the logic which drive BO movement/validation. Do you think it's good plan ? Do you have any requirement or wish on the way i could do that ? Note that i don't intend to implement the scheme i just described in the first shot but rather to stick with the bounded BO movement/ placement/validation scheme as first step. Then i can slowly try implementing this scheme or any other better scheme one might come up with :) Cheers, Jerome Glisse ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel