vcn: fix race condition issue for vcn start

James Zhu Wed, 04 Mar 2020 07:10:23 -0800


On 2020-03-04 10:03 a.m., Christian König wrote:

Am 04.03.20 um 15:57 schrieb James Zhu:
On 2020-03-04 3:53 a.m., Christian König wrote:
Am 03.03.20 um 23:48 schrieb James Zhu:
On 2020-03-03 2:03 p.m., James Zhu wrote:
On 2020-03-03 1:44 p.m., Christian König wrote:
Am 03.03.20 um 19:16 schrieb James Zhu:
Fix race condition issue when multiple vcn starts are called.

Signed-off-by: James Zhu <james....@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 4 ++++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 +
  2 files changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.cb/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index f96464e..aa7663f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -63,6 +63,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device*adev)
      int i, r;
INIT_DELAYED_WORK(&adev->vcn.idle_work,amdgpu_vcn_idle_work_handler);
+    mutex_init(&adev->vcn.vcn_pg_lock);
        switch (adev->asic_type) {
      case CHIP_RAVEN:
@@ -210,6 +211,7 @@ int amdgpu_vcn_sw_fini(struct amdgpu_device*adev)
      }
        release_firmware(adev->vcn.fw);
+    mutex_destroy(&adev->vcn.vcn_pg_lock);
        return 0;
  }
@@ -321,6 +323,7 @@ void amdgpu_vcn_ring_begin_use(structamdgpu_ring *ring)
      struct amdgpu_device *adev = ring->adev;
bool set_clocks =!cancel_delayed_work_sync(&adev->vcn.idle_work);
  +    mutex_lock(&adev->vcn.vcn_pg_lock);
That still won't work correctly here.
The whole idea of the cancel_delayed_work_sync() andschedule_delayed_work() dance is that you have exactly one userof that. If you have multiple rings that whole thing won't workcorrectly.
To fix this you need to call mutex_lock() beforecancel_delayed_work_sync() and schedule_delayed_work() beforemutex_unlock().
Big lock definitely works. I am trying to use as smaller lock aspossible here. the share resource which needs protect here arepower gate process and dpg mode switch process.
if we move mutex_unlock() before schedule_delayed_work(. I amwondering what are the other necessary resources which need protect.
By the way, cancel_delayed_work_sync() supports multiple threaditself, so I didn't put it into protection area.
Yeah, but that's correct but it still won't working correctly :)
See the problem is that only for the first callercancel_delayed_work_sync() returns true because it canceled thedelayed work.
if the 1st caller gets true. the 2nd caller unfortunately may missthis pending status, so it will ungate the power which is unexpected.
But in power gate/ungate function, a power state is maintained, sothis miss won't be really triggered to ungate the power.
So I think cancel_delayed_work_sync() / schedule_delayed_work() arenot necessary be protected here.
Ok that could work as well.
But in this case I would remove checking the return value ofcancel_delayed_work_sync() and just always ungate the power.
This way we prevent ugly bugs from leaking in when this really racessometimes.


Sure. Thanks!

James

Regards,
Christian.
Best Regards!

James
For all others it returns false and those would then think that theyneed to ungate the power.
The only solution I see is to either put both thecancel_delayed_work_sync() and schedule_delayed_work() under thesame mutex protection or start to use an atomic or other counter tonote concurrent processing.
power gate is shared by all VCN IP instances and different rings ,so it needs be put into protection area.
each ring's job itself is serialized by scheduler. it doesn't needbe put into this protection area.
Yes, those should work as expected.

Regards,
Christian.
Thanks!

James
Regards,
Christian.
      if (set_clocks) {
          amdgpu_gfx_off_ctrl(adev, false);
amdgpu_device_ip_set_powergating_state(adev,AMD_IP_BLOCK_TYPE_VCN,@@ -345,6 +348,7 @@ void amdgpu_vcn_ring_begin_use(structamdgpu_ring *ring)
            adev->vcn.pause_dpg_mode(adev, ring->me, &new_state);
      }
+    mutex_unlock(&adev->vcn.vcn_pg_lock);
  }
    void amdgpu_vcn_ring_end_use(struct amdgpu_ring *ring)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.hb/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
index 6fe0573..2ae110d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
@@ -200,6 +200,7 @@ struct amdgpu_vcn {
struct drm_gpu_scheduler*vcn_dec_sched[AMDGPU_MAX_VCN_INSTANCES];
      uint32_t         num_vcn_enc_sched;
      uint32_t         num_vcn_dec_sched;
+    struct mutex         vcn_pg_lock;
        unsigned    harvest_config;
      int (*pause_dpg_mode)(struct amdgpu_device *adev,
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7CJames.Zhu%40amd.com%7Ce9f5dcb164bc4efacd7908d7c04d254f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637189309893957692&sdata=WcwmGejuagp5T9ojpSPLaU%2BmlHuidkh7SP3jDASpUSI%3D&reserved=0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/4] drm/amdgpu/vcn: fix race condition issue for vcn start

Reply via email to