> On May 17, 2024, at 17:06, Andi Shyti <[email protected]> wrote: > > The whole point of the previous fixes has been to change the CCS > hardware configuration to generate only one stream available to > the compute users. We did this by changing the info.engine_mask > that is set during device probe, reset during the detection of > the fused engines, and finally reset again when choosing the CCS > mode. > > We can't use the engine_mask variable anymore, as with the > current configuration, it imposes only one CCS no matter what the > hardware configuration is. > > Before changing the engine_mask for the third time, save it and > use it for calculating the CCS mode. > > After the previous changes, the user reported a performance drop > to around 1/4. We have tested that the compute operations, with > the current patch, have improved by the same factor. > > Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload") > Cc: Chris Wilson <[email protected]> > Cc: Gnattu OC <[email protected]> > Cc: Joonas Lahtinen <[email protected]> > Cc: Matt Roper <[email protected]> > Tested-by: Jian Ye <[email protected]> > --- > Hi, > > This ensures that all four CCS engines work properly. However, > during the tests, Jian detected that the performance during > memory copy assigned to the CCS engines is negatively impacted. > > I believe this might be expected, considering that based on the > engines' availability, the media user might decide to reduce the > copy in multitasking. > > With the upcoming work that will give the user the chance to > configure the CCS mode, this might improve. > > Gnattu, can I use your kindness to ask for a test on this patch > and check whether the performance improve on your side as well? > > Thanks, > Andi > > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 ++++++ > drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +- > drivers/gpu/drm/i915/gt/intel_gt_types.h | 8 ++++++++ > 3 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > index 5c8e9ee3b008..3b740ca25000 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > @@ -885,6 +885,12 @@ static intel_engine_mask_t init_engine_mask(struct > intel_gt *gt) > if (IS_DG2(gt->i915)) { > u8 first_ccs = __ffs(CCS_MASK(gt)); > > + /* > + * Store the number of active cslices before > + * changing the CCS engine configuration > + */ > + gt->ccs.cslices = CCS_MASK(gt); > + > /* Mask off all the CCS engine */ > info->engine_mask &= ~GENMASK(CCS3, CCS0); > /* Put back in the first CCS engine */ > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c > b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c > index 99b71bb7da0a..3c62a44e9106 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c > @@ -19,7 +19,7 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt) > > /* Build the value for the fixed CCS load balancing */ > for (cslice = 0; cslice < I915_MAX_CCS; cslice++) { > - if (CCS_MASK(gt) & BIT(cslice)) > + if (gt->ccs.cslices & BIT(cslice)) > /* > * If available, assign the cslice > * to the first available engine... > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h > b/drivers/gpu/drm/i915/gt/intel_gt_types.h > index def7dd0eb6f1..cfdd2ad5e954 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h > +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h > @@ -207,6 +207,14 @@ struct intel_gt { > [MAX_ENGINE_INSTANCE + 1]; > enum intel_submission_method submission_method; > > + struct { > + /* > + * Mask of the non fused CCS slices > + * to be used for the load balancing > + */ > + intel_engine_mask_t cslices; > + } ccs; > + > /* > * Default address space (either GGTT or ppGTT depending on arch). > * > -- > 2.43.0
Hi Andi, I can confirm that this patch restores most of the performance we had before the CCS change. I do notice a reduction in memcpy performance, but it is good enough for our use case since our video processing pipeline is zero-copy once the video is loaded to the VRAM. Tested-by: Gnattu OC <[email protected] <mailto:[email protected]>>
