Am 23.02.20 um 13:21 schrieb Pan, Xinhui:

2020年2月23日 20:04,Koenig, Christian <[email protected]> 写道:

Am 23.02.20 um 12:56 schrieb Pan, Xinhui:
If shared fence list is not empty, even we want to test all fences, excl fence 
is ignored.
That is abviously wrong, so fix it.
Yeah that is a known issue and I completely agree with you, but other disagree.

See the shared fences are meant to depend on the exclusive fence. So all shared 
fences must finish only after the exclusive one has finished as well.
fair enough.

The problem now is that for error handling this isn't necessary true. In other 
words when a shared fence completes with an error it is perfectly possible that 
he does this before the exclusive fence is finished.

I'm trying to convince Daniel that this is a problem for years :)

I have met problems, eviction has race with bo relase.  system memory is 
overwried by sDMA. the kernel is 4.19, stable one, LOL.

Ok sounds like we add some shared fence which doesn't depend on the exclusive one to finish. That is of course highly problematic for the current handling.

It might be that this happens with the TTM move fence, but I don't of hand know either how to prevent that.

Question at Daniel and others: Can we finally drop this assumption that all shared fences must finish after the exclusive one?

Thanks for pointing this out Xinhui,
Christian.


amdgpu add excl fence to bo to move system memory which is done by the drm 
scheduler.
after sDMA finish the moving job,  the memory might have already been released 
as dma_resv_test_signaled_rcu did not check excl fence.

Our local customer report this issue. I took 4 days into it. sigh

thanks
xinhui

Regards,
Christian.

Signed-off-by: xinhui pan <[email protected]>
---
  drivers/dma-buf/dma-resv.c | 9 +++++----
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 4264e64788c4..44dc64c547c6 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -632,14 +632,14 @@ static inline int dma_resv_test_signaled_single(struct 
dma_fence *passed_fence)
   */
  bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
  {
-       unsigned seq, shared_count;
+       unsigned int seq, shared_count, left;
        int ret;
        rcu_read_lock();
  retry:
        ret = true;
        shared_count = 0;
-       seq = read_seqcount_begin(&obj->seq);
+       left = seq = read_seqcount_begin(&obj->seq);
        if (test_all) {
                unsigned i;
@@ -647,7 +647,7 @@ bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool 
test_all)
                struct dma_resv_list *fobj = rcu_dereference(obj->fence);
                if (fobj)
-                       shared_count = fobj->shared_count;
+                       left = shared_count = fobj->shared_count;
                for (i = 0; i < shared_count; ++i) {
                        struct dma_fence *fence = 
rcu_dereference(fobj->shared[i]);
@@ -657,13 +657,14 @@ bool dma_resv_test_signaled_rcu(struct dma_resv *obj, 
bool test_all)
                                goto retry;
                        else if (!ret)
                                break;
+                       left--;
                }
                if (read_seqcount_retry(&obj->seq, seq))
                        goto retry;
        }
  -     if (!shared_count) {
+       if (!left) {
                struct dma_fence *fence_excl = rcu_dereference(obj->fence_excl);
                if (fence_excl) {

_______________________________________________
dri-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to