> On May 17, 2024, at 17:06, Andi Shyti <[email protected]> wrote:
> 
> The whole point of the previous fixes has been to change the CCS
> hardware configuration to generate only one stream available to
> the compute users. We did this by changing the info.engine_mask
> that is set during device probe, reset during the detection of
> the fused engines, and finally reset again when choosing the CCS
> mode.
> 
> We can't use the engine_mask variable anymore, as with the
> current configuration, it imposes only one CCS no matter what the
> hardware configuration is.
> 
> Before changing the engine_mask for the third time, save it and
> use it for calculating the CCS mode.
> 
> After the previous changes, the user reported a performance drop
> to around 1/4. We have tested that the compute operations, with
> the current patch, have improved by the same factor.
> 
> Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload")
> Cc: Chris Wilson <[email protected]>
> Cc: Gnattu OC <[email protected]>
> Cc: Joonas Lahtinen <[email protected]>
> Cc: Matt Roper <[email protected]>
> Tested-by: Jian Ye <[email protected]>
> ---
> Hi,
> 
> This ensures that all four CCS engines work properly. However,
> during the tests, Jian detected that the performance during
> memory copy assigned to the CCS engines is negatively impacted.
> 
> I believe this might be expected, considering that based on the
> engines' availability, the media user might decide to reduce the
> copy in multitasking.
> 
> With the upcoming work that will give the user the chance to
> configure the CCS mode, this might improve.
> 
> Gnattu, can I use your kindness to ask for a test on this patch
> and check whether the performance improve on your side as well?
> 
> Thanks,
> Andi
> 
> drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 6 ++++++
> drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-
> drivers/gpu/drm/i915/gt/intel_gt_types.h    | 8 ++++++++
> 3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 5c8e9ee3b008..3b740ca25000 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -885,6 +885,12 @@ static intel_engine_mask_t init_engine_mask(struct 
> intel_gt *gt)
>       if (IS_DG2(gt->i915)) {
>               u8 first_ccs = __ffs(CCS_MASK(gt));
> 
> +             /*
> +              * Store the number of active cslices before
> +              * changing the CCS engine configuration
> +              */
> +             gt->ccs.cslices = CCS_MASK(gt);
> +
>               /* Mask off all the CCS engine */
>               info->engine_mask &= ~GENMASK(CCS3, CCS0);
>               /* Put back in the first CCS engine */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> index 99b71bb7da0a..3c62a44e9106 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> @@ -19,7 +19,7 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
> 
>       /* Build the value for the fixed CCS load balancing */
>       for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
> -             if (CCS_MASK(gt) & BIT(cslice))
> +             if (gt->ccs.cslices & BIT(cslice))
>                       /*
>                        * If available, assign the cslice
>                        * to the first available engine...
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
> b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index def7dd0eb6f1..cfdd2ad5e954 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -207,6 +207,14 @@ struct intel_gt {
>                                           [MAX_ENGINE_INSTANCE + 1];
>       enum intel_submission_method submission_method;
> 
> +     struct {
> +             /*
> +              * Mask of the non fused CCS slices
> +              * to be used for the load balancing
> +              */
> +             intel_engine_mask_t cslices;
> +     } ccs;
> +
>       /*
>        * Default address space (either GGTT or ppGTT depending on arch).
>        *
> -- 
> 2.43.0

Hi Andi,

I can confirm that this patch restores most of the performance we had before 
the CCS change. 

I do notice a reduction in memcpy performance, but it is good enough for our 
use case since our video processing pipeline is zero-copy once the video is 
loaded to the VRAM.

Tested-by: Gnattu OC <[email protected] <mailto:[email protected]>>

Reply via email to