From: Karunika Choo <[email protected]>

[ Upstream commit e322a4844811b54477b7072eb40dc9e402a1725d ]

In certain scenarios, it is possible for multiple cache flushes to be
requested before the previous one completes. This patch introduces the
cache_flush_lock mutex to serialize these operations and ensure that
any requested cache flushes are completed instead of dropped.

Reviewed-by: Liviu Dudau <[email protected]>
Co-developed-by: Dennis Tsiang <[email protected]>
Signed-off-by: Dennis Tsiang <[email protected]>
Signed-off-by: Karunika Choo <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Signed-off-by: Steven Price <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---

LLM Generated explanations, may be completely bogus:

YES

Why this is a real bug that users can hit
- Dropped flush requests: In the current code, if a cache flush is
  already in progress, a second caller does not issue another flush and
  simply waits for the first one to complete. See the conditional in
  `drivers/gpu/drm/panthor/panthor_gpu.c:358` and command write at
  `drivers/gpu/drm/panthor/panthor_gpu.c:360`. When
  `GPU_IRQ_CLEAN_CACHES_COMPLETED` is set, the second call skips issuing
  `GPU_FLUSH_CACHES(...)` and only waits for the existing flush to
  finish. If new CPU writes requiring a flush occur between the first
  flush’s command and the second caller’s request, those writes are not
  covered by the first flush. The second caller returns success without
  a flush that includes its writes. This is a correctness/coherency bug.
- WARN indicates it was unintended: The code explicitly warns if a flush
  is requested while one is pending (`drm_WARN_ON(...)` at
  `drivers/gpu/drm/panthor/panthor_gpu.c:358`), which already signals
  that concurrent callers were expected to be serialized at a higher
  level. The fact this commit adds serialization in the driver indicates
  concurrency can and does happen in practice.

What the patch changes
- Adds a dedicated mutex to serialize flush callers:
  - New field `struct mutex cache_flush_lock` in `struct panthor_gpu`
    (struct currently starts at
    `drivers/gpu/drm/panthor/panthor_gpu.c:26`).
  - Initializes it in `panthor_gpu_init()` alongside existing locks/wq
    (near `drivers/gpu/drm/panthor/panthor_gpu.c:166` where
    `spin_lock_init()` and `init_waitqueue_head()` are done).
  - Wraps `panthor_gpu_flush_caches()` entry with
    `guard(mutex)(&ptdev->gpu->cache_flush_lock);`, ensuring only one
    caller issues a flush command and waits at a time (function starts
    at `drivers/gpu/drm/panthor/panthor_gpu.c:350`).
- Effectively guarantees that each flush request results in a hardware
  flush. Without the mutex, concurrent callers can “piggyback” on a
  previous flush and return without their own flush, losing the ordering
  guarantee they expect.

Scope and risk assessment
- Small and contained: One file touched
  (`drivers/gpu/drm/panthor/panthor_gpu.c`), adding a `struct mutex`
  field, its init, and a single guard in one function. No ABI, uAPI, or
  architectural changes.
- Minimal regression risk: The function already sleeps
  (`wait_event_timeout(...)` at
  `drivers/gpu/drm/panthor/panthor_gpu.c:365`), so adding a mutex
  doesn’t alter the sleepability requirements. The only in-tree caller
  is from the scheduler path
  (`drivers/gpu/drm/panthor/panthor_sched.c:2742`) under `sched->lock`,
  not IRQ/atomic context.
- Locking safety: The IRQ handler uses only the spinlock `reqs_lock`
  (see `drivers/gpu/drm/panthor/panthor_gpu.c:156`-
  `drivers/gpu/drm/panthor/panthor_gpu.c:167`) and doesn’t touch the new
  mutex, so there’s no new lock inversion with the interrupt path. The
  flush function’s existing spinlock section remains unchanged and still
  protects `pending_reqs`.
- Guard macro availability: This stable tree already uses `guard(mutex)`
  widely (for example in `virt/lib/irqbypass.c:102` et al.), so the new
  `guard(mutex)` in this driver is compatible. If needed for strict
  include hygiene, `#include <linux/cleanup.h>` can be added, but
  similar files compile without explicitly adding it.

User impact and stable policy fit
- Fixes a real concurrency/coherency bug affecting correctness: A later
  flush request can be silently dropped, potentially leading to stale
  data observed by the GPU and spurious faults or subtle rendering/data
  corruption. This clearly affects users under certain timing
  conditions.
- No new features or behavior changes beyond making the existing API
  reliable under concurrency.
- Minimal risk, localized change in a driver subsystem.
- Although the commit message doesn’t carry a “Fixes:” or “Cc:
  stable@...” tag, it is a straightforward bug fix that meets stable
  backport criteria.

Cross-checks in the tree
- Current implementation demonstrating the bug:
  - Conditional suppression of a second flush:
    `drivers/gpu/drm/panthor/panthor_gpu.c:358`
  - Single flush command write:
    `drivers/gpu/drm/panthor/panthor_gpu.c:360`
  - Wait and timeout handling (unchanged by the patch): `drivers/gpu/drm
    /panthor/panthor_gpu.c:365`-
    `drivers/gpu/drm/panthor/panthor_gpu.c:375`
- Only in-tree caller identified:
  `drivers/gpu/drm/panthor/panthor_sched.c:2742`, but concurrency can
  still arise across scheduler/reset/suspend sequences or multiple
  threads.

Conclusion
- This is an important correctness fix with low risk and a small diff,
  preventing flushed-from-under-you race conditions. It should be
  backported to stable.

 drivers/gpu/drm/panthor/panthor_gpu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c 
b/drivers/gpu/drm/panthor/panthor_gpu.c
index cb7a335e07d7c..030409371037b 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -35,6 +35,9 @@ struct panthor_gpu {
 
        /** @reqs_acked: GPU request wait queue. */
        wait_queue_head_t reqs_acked;
+
+       /** @cache_flush_lock: Lock to serialize cache flushes */
+       struct mutex cache_flush_lock;
 };
 
 /**
@@ -204,6 +207,7 @@ int panthor_gpu_init(struct panthor_device *ptdev)
 
        spin_lock_init(&gpu->reqs_lock);
        init_waitqueue_head(&gpu->reqs_acked);
+       mutex_init(&gpu->cache_flush_lock);
        ptdev->gpu = gpu;
        panthor_gpu_init_info(ptdev);
 
@@ -353,6 +357,9 @@ int panthor_gpu_flush_caches(struct panthor_device *ptdev,
        bool timedout = false;
        unsigned long flags;
 
+       /* Serialize cache flush operations. */
+       guard(mutex)(&ptdev->gpu->cache_flush_lock);
+
        spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags);
        if (!drm_WARN_ON(&ptdev->base,
                         ptdev->gpu->pending_reqs & 
GPU_IRQ_CLEAN_CACHES_COMPLETED)) {
-- 
2.51.0

Reply via email to