On Mon, Apr 15, 2024 at 01:44:21PM +0530, Balasubramani Vivekanandan wrote:
> From: Nirmoy Das <[email protected]>
> 
> Display surfaces can be tagged as transient by mapping it using one of
> the various L3:XD PAT index modes on Xe2. The expectation is that KMD
> needs to request transient data flush at the start of flip sequence to
> ensure all transient data in L3 cache is flushed to memory. Add a
> routine for this which we can then call from the display code.
> 
> CC: Matt Roper <[email protected]>
> Signed-off-by: Nirmoy Das <[email protected]>
> Co-developed-by: Matthew Auld <[email protected]>
> Signed-off-by: Matthew Auld <[email protected]>
> Signed-off-by: Balasubramani Vivekanandan 
> <[email protected]>
> ---
>  drivers/gpu/drm/xe/regs/xe_gt_regs.h |  3 ++
>  drivers/gpu/drm/xe/xe_device.c       | 49 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_device.h       |  2 ++
>  3 files changed, 54 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h 
> b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> index 8fe811ea404a..65719a712807 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> @@ -318,6 +318,9 @@
>  
>  #define XE2LPM_L3SQCREG5                     XE_REG_MCR(0xb658)
>  
> +#define XE2_TDF_CTRL                         XE_REG(0xb418)
> +#define   TRANSIENT_FLUSH_REQUEST            REG_BIT(0)
> +
>  #define XEHP_MERT_MOD_CTRL                   XE_REG_MCR(0xcf28)
>  #define RENDER_MOD_CTRL                              XE_REG_MCR(0xcf2c)
>  #define COMP_MOD_CTRL                                XE_REG_MCR(0xcf30)
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index d85a2ba0a057..22e6422c7b8e 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -717,6 +717,55 @@ void xe_device_wmb(struct xe_device *xe)
>               xe_mmio_write32(gt, SOFTWARE_FLAGS_SPR33, 0);
>  }
>  
> +/**
> + * xe_device_td_flush() - Flush transient L3 cache entries
> + * @xe: The device
> + *
> + * Display engine has direct access to memory and is never coherent with 
> L3/L4
> + * caches (or CPU caches), however KMD is responsible for specifically 
> flushing
> + * transient L3 GPU cache entries prior to the flip sequence to ensure 
> scanout
> + * can happen from such a surface without seeing corruption.
> + *
> + * Display surfaces can be tagged as transient by mapping it using one of the
> + * various L3:XD PAT index modes on Xe2.
> + *
> + * Note: On non-discrete xe2 platforms, like LNL, the entire L3 cache is 
> flushed
> + * at the end of each submission via PIPE_CONTROL for compute/render, since 
> SA
> + * Media is not coherent with L3 and we want to support render-vs-media
> + * usescases. For other engines like copy/blt the HW internally forces 
> uncached
> + * behaviour, hence why we can skip the TDF on such platforms.
> + */
> +void xe_device_td_flush(struct xe_device *xe)
> +{
> +     struct xe_gt *gt;
> +     u8 id;
> +
> +     if (!IS_DGFX(xe) || GRAPHICS_VER(xe) < 20)
> +             return;
> +
> +     for_each_gt(gt, xe, id) {
> +             if (xe_gt_is_media_type(gt))
> +                     continue;
> +
> +             if (xe_force_wake_get(gt_to_fw(gt), XE_FW_GT))
> +                     return;
> +
> +             xe_mmio_write32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST);
> +             /*
> +              * FIXME: We can likely do better here with our choice of
> +              * timeout. Currently we just assume the worst case, i.e. 64us,
> +              * which is believed to be sufficient to cover the worst case
> +              * scenario on current platforms if all cache entries are
> +              * transient and need to be flushed..
> +              */
> +             if (xe_mmio_wait32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST, 0,
> +                                150, NULL, false))

Comment (64us) doesn't seem to match code (150us).

Aside from that,

        Reviewed-by: Matt Roper <[email protected]>


Matt

> +                     xe_gt_err_once(gt, "TD flush timeout\n");
> +
> +             xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
> +     }
> +}
> +
>  u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
>  {
>       return xe_device_has_flat_ccs(xe) ?
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index d413bc2c6be5..d3430f4b820a 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -176,4 +176,6 @@ void xe_device_snapshot_print(struct xe_device *xe, 
> struct drm_printer *p);
>  u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address);
>  u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address);
>  
> +void xe_device_td_flush(struct xe_device *xe);
> +
>  #endif
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

Reply via email to