Op 11-10-12 21:26, Thomas Hellstrom schreef:
> On 10/11/2012 08:42 PM, Maarten Lankhorst wrote:
>
>>
>>> Anyway, if you plan to remove the fence lock and protect it with reserve, 
>>> you must
>>> make sure that a waiting reserve is never done in a destruction path. I 
>>> think this
>>> mostly concerns the nvidia driver.
>> Well I don't think any lock should ever be held during destruction time,
>
> What I mean is, that *something* needs to protect the fence pointer. 
> Currently it's the
> fence lock, and I was assuming you'd protect it with reserve. And neither TTM 
> nor
> Nvidia should, when a resource is about to be freed, be forced to *block* 
> waiting for
> reserve just to access the fence pointer. When and if you have a solution 
> that fulfills
> those requirements, I'm ready to review it.
It's not blocking, cleanup_refs_or_queue will toss it on the deferred list if 
reservation fails,
behavior doesn't change just because I changed the order around.
>>
>>>> - no_wait_reserve is ignored if no_wait_gpu is false
>>>>     ttm_bo_reserve_locked can only return true if no_wait_reserve is true, 
>>>> but
>>>>     subsequently it will do a wait_unreserved if no_wait_gpu is false.
>>>> I'm planning on removing this argument and act like it is always true, 
>>>> since
>>>> nothing on the lru list should fail to reserve currently.
>>> Yes, since all buffers that are reserved are removed from the LRU list, 
>>> there
>>> should never be a waiting reserve on them, so no_wait_reserve can be removed
>>> from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the 
>>> call chain.
>> I suppose there will stay a small race though,
>
> Hmm, where?
When you enter the ddestroy path, you drop the lock and hope the buffer doesn't 
reserved
away from under you.

>>>> - effectively unlimited callchain between some functions that all go 
>>>> through
>>>>     ttm_mem_evict_first:
>>>>
>>>>                                       /------------------------\
>>>> ttm_mem_evict_first - ttm_bo_evict -                          
>>>> -ttm_bo_mem_space  - ttm_bo_mem_force_space - ttm_mem_evict_first
>>>>                                       \ ttm_bo_handle_move_mem /
>>>> I'm not surprised that there was a deadlock before, it seems to me it would
>>>> be pretty suicidal to ever do a blocking reserve on any of those lists,
>>>> lockdep would be all over you for this.
>>> Well, at first this may look worse than it actually is. The driver's 
>>> eviction memory order determines the recursion depth
>>> and typically it's 0 or 1, since subsequent ttm_mem_evict_first should 
>>> never touch the same LRU lists as the first one.
>>> What would typically happen is that a BO is evicted from VRAM to TT, and if 
>>> there is no space in TT, another BO is evicted
>>> to system memory, and the chain is terminated. However a driver could set 
>>> up any eviction order but that would be
>>> a BUG.
>>>
>>> But in essence, as you say, even with a small recursion depth, a waiting 
>>> reserve could cause a deadlock.
>>> But there should be no waiting reserves in the eviction path currently.
>> Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve.
>> Fixing this might mean that ttm_mem_evict_first may need to become more 
>> aggressive,
>> since it seems all the callers of this function assume that 
>> ttm_mem_evict_first can only fail
>> if there is really nothing more to free and blocking nested would really 
>> upset lockdep
>> and leave you open to the same deadlocks.
> I can't see how the waiting reserve in ttm_bo_cleanup_refs would cause a 
> deadlock,
> because the buffer about to be reserved is always *last* in a reservation 
> sequence, and the
> reservation is always released (or the buffer destroyed) before trying to 
> reserve another buffer.
> Technically the buffer is not looked up from a LRU list but from the delayed 
> delete list.
> Could you describe such a deadlock case?
The only interesting case for this is ttm_mem_evict_first, and while it may not 
technically
be a deadlock, lockdep will flag you for blocking on this anyway, since the 
only reason it
would not be a deadlock is if you know the exact semantics of why.
>> An unexpected bonus from adding this skipping would be that the atomic 
>> lru_list
>> removal on unreserve guarantee becomes a lot easier to drop while keeping 
>> behavior
>> exactly the same. Only the swapout code would need to be adjusted for to try 
>> the whole
>> list. Maybe ttm_bo_delayed_delete with remove_all = false too would change 
>> behavior
>> slightly too, but this seems to be less likely, as it could only ever happen 
>> on delayed
>> destruction of imported dma-buf's.
>
>
> May I kindly remind you that the atomic lru_list removal stays in TTM
> until we've had a high level design discussion weighing it against an extended
> reservation object API.
I'm aware, but I still wanted to see if it was possible or not in a clean way 
for testing,
so I don't ask for things that would be impossible to do or too involved to do 
in a safe
manner.

Will you make it to UDS-r?

~Maarten
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to