On 11/18/2011 03:56 PM, Jerome Glisse wrote: > On Fri, Nov 18, 2011 at 03:30:03PM +0100, Thomas Hellstrom wrote: > >> On 11/18/2011 02:15 PM, Ben Skeggs wrote: >> >>> On Fri, 2011-11-18 at 08:57 +0100, Thomas Hellstrom wrote: >>> >>>> Jerome, >>>> >>>> I don't like this change for the following reasons >>>> >>> -snip- >>> >>> >>>>> One can take advantage of move notify callback but, there are >>>>> corner case where bind/unbind might be call without move notify >>>>> being call (in error path mostly). So to make sure that each >>>>> virtual address space have a consistent view of wether a buffer >>>>> object is backed or not by system page it's better to pass the >>>>> buffer object to bind/unbind. >>>>> >>> As I discussed previously with Jerome on IRC, I believe the move_notify >>> hook is sufficient. I fixed a couple of issues back with it back when I >>> implemented support for this in nouveau. >>> >>> Jerome did point out two issues however, which I believe can be solved >>> easily enough. >>> >>> The first was that the error path after move_notify() has been called >>> doesn't reverse the move_notify() too, leaving incorrect page table >>> entries. This, I think, could easily be solved by swapping bo->mem and >>> mem, and re-calling move_notify() to have the driver reverse whatever it >>> did. >>> >>> The second issue is that apparently move_notify() doesn't get called in >>> the destroy path. Nouveau's GEM layer takes care of this for our >>> userspace bos, and we don't use any kernel bos that are mapped into a >>> channel's address space so we don't hit it. This can probably be solved >>> easily enough too I expect. >>> >>> Ben. >>> >>> >> OK. I understand. Surely if a move_notify is missing somewhere, that >> can easily be fixed. >> >> It might be good if we can outline how the virtual tables are set >> up. In my world, the following would happen: >> >> 1) Pre command submission: >> >> a) reserve bo >> b) call ttm_bo_validate to set placement. move_notify will tear down >> any existing GPU page tables for the bo. >> c) Set up GPU page tables. >> d) Command submission >> e) unreserve_bo(). >> >> >> 2) When eviction happens: >> a) reserve bo >> b) move_notify is called to tear down page tables >> c) change placement >> d) Unreserve bo. >> >> Let's say an error occurs in 1d) Why would you need to undo 1c? >> >> Similarly if an error occurs in 2c) Why would you need to undo 2b)? >> >> > Error is in ttm_bo_handle_move_mem move_notify is call before we do the > move but the move might fail. > But even if the move fails in 1b) isn't it safe to just leave the virtual GPU map unbound, since GPU page tables will *always* be set up when placement is complete in 1c?
> For destroy issue is when destroying a gtt buffer, it will just be > unbind, no call to ttm_bo_handle_move_mem which is the only function > calling back move_notify. (see ttm_bo_release_list). > Yes, here a fix is needed. > I will fix this 2 corner case. I just wanted to make things symetrical. > By hooking up the virtual address space update with bind/unbind > > Note that at one point in the future the dream i have is no more > reserve, iommu page fault which is coming up (PASID process address > space id, with ATS address translation service and PRI page request > interface) would allow such things. Only synchronization will be > needed when moving object from system ram to vram or vice versa. > Indeed, but then TTM will probably be overkill, and something simpler can be implemented. > Cheers, > Jerome > Thanks, Thomas