[PATCH] drm/ttm: remove race and optimise evicting destroyed buffers.

Thomas Hellstrom Wed, 29 Sep 2010 12:28:23 +0200

On 09/29/2010 05:45 AM, Dave Airlie wrote:
> From: Dave Airlie<airlied at redhat.com>
>
> A race condition was identifed that led to a leak of the BOs, when
>   a bo was on the delayed delete list and was selected for eviction,
> the code would enter
>
> thread 1 (evict)                 -- thread2 (dd)
> bo added to delayed destroy list
> take lru lock
> ttm_mem_evict_first called
>     - reserve locked
>     - remove from lru
>     - drop lru lock
>                               take lru lock
>                               try del from lru - get put_count == 0
>                               try to reserve locked
>                                       - drop lru lock to wait unreserved
> call bo_evict
> unreserve buffer
>     - take lru lock
>     - add back to lru
>     - unreserver
>     - drop lru lock
>                               take lru lock due to unreserved
>                               unbind ttm
>                               drop put_count references (which is 0)
>                               despite being on the lru/swap lru lists.
>                               leak due to never losing reference
>
> The obvious quick fix is to try the bo from the ddestroy list if we
> are going to evict it, however if we are going to evict a buffer that
> is going to be destroyed we should just destroy it instead, its more
> likely to be a lighter-weight operation than evict + delayed destroy.
>
> This patch check is a bo that we are about to evict is actually on the
> destroy list and if it is, it unreserves it (without adding to lru),
> and cleans up its references. It should then get destroyed via
> normal ref counting means.
>
>


Dave, this will not work, since we can't guarantee that the buffer will 
be destroyed immediatly. There may be other holders of a list_kref, and 
if the destruction doesn't happen immediately, we don't get immediate 
eviction.

If we take a command submission lock and decide to evict all buffers to 
clean up fragmentation, we need to be able to guarantee that all buffers 
are evicted synchronously.

Having said that, I think it should be possible to implement this if we 
in addition to the above code free the mm_node immediately and don't 
rely on the buffer being destroyed to do that.

But that's an optimization rather than a bug fix. The real bug sits in 
ttm_bo_cleanup_refs(), in the fact that we remove the buffers from the 
lru lists before we reserve.

Let me try to come up with a fix for that. I think there needs to be a 
second wait for bo idle as well, since in theory, someone might have 
started an eviction using an accelerated path..

/Thomas
















> proposed fix for:
> https://bugzilla.redhat.com/show_bug.cgi?id=615505
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=591061
>
> Signed-off-by: Dave Airlie<airlied at redhat.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c |   18 +++++++++++++++---
>   1 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index cb4cf7e..60689b1 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -689,7 +689,7 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
>       struct ttm_mem_type_manager *man =&bdev->man[mem_type];
>       struct ttm_buffer_object *bo;
>       int ret, put_count = 0;
> -
> +     bool destroy = false;
>   retry:
>       spin_lock(&glob->lru_lock);
>       if (list_empty(&man->lru)) {
> @@ -719,6 +719,13 @@ retry:
>       }
>
>       put_count = ttm_bo_del_from_lru(bo);
> +
> +     /* is the buffer currently on the delayed destroy list? */
> +     if (!list_empty(&bo->ddestroy)) {
> +             list_del_init(&bo->ddestroy);
> +             destroy = true;
> +             put_count++;
> +     }
>       spin_unlock(&glob->lru_lock);
>
>       BUG_ON(ret != 0);
> @@ -726,8 +733,13 @@ retry:
>       while (put_count--)
>               kref_put(&bo->list_kref, ttm_bo_ref_bug);
>
> -     ret = ttm_bo_evict(bo, interruptible, no_wait_reserve, no_wait_gpu);
> -     ttm_bo_unreserve(bo);
> +     if (destroy) {
> +             atomic_set(&bo->reserved, 0);
> +             ret = ttm_bo_cleanup_refs(bo, !no_wait_gpu);
> +     } else {
> +             ret = ttm_bo_evict(bo, interruptible, no_wait_reserve, 
> no_wait_gpu);
> +             ttm_bo_unreserve(bo);
> +     }
>
>       kref_put(&bo->list_kref, ttm_bo_release_list);
>       return ret;
>

[PATCH] drm/ttm: remove race and optimise evicting destroyed buffers.

Reply via email to