Zhigang Gong <[email protected]> writes:

> According to bspec, ROW_CHICKEN3's bit 6 which is to disable
> move of cacheable global atomics to L3 is needed for GT3 D
> stepping.
>
> I enabled it and tested it with HSW GT2 D stepping and GT3 E stepping.
> The atomics works fine in beignet. And it could get more than 10x performance
> improvement with some workload, for an example, the "splat" kernel in 
> darktable,
> without this patch, it consumes 50 seconds in one large raw picture 
> processing.
> But with this patch, the same process only takes less than 1 second.
>

I tried this already (on HSW GT2 D as well) and I don't think it's
enough to get L3 atomics working reliably.  Even though they did seem to
work OK at first glance I observed some corruption issues (e.g. atomic
writes not landing in system memory) when doing atomic writes to
contiguous (as in within the same cache-line) locations in memory.  The
"unused" ARB_shader_image_load_store test [1] I sent to the Piglit
mailing list some time ago exposes this IIRC, and probably a couple of
other tests too.

Also this change is going to cause an instant lock-up anytime Mesa uses
atomics because Mesa doesn't change the default L3 way allocation for
the DC, which turns out to be 0 on HSW.

[1] http://lists.freedesktop.org/archives/piglit/2014-December/013571.html

> Signed-off-by: Zhigang Gong <[email protected]>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 7d99a9c..8a27802 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5938,10 +5938,12 @@ static void haswell_init_clock_gating(struct 
> drm_device *dev)
>  
>       ilk_init_lp_watermarks(dev);
>  
> -     /* L3 caching of data atomics doesn't work -- disable it. */
> -     I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
> -     I915_WRITE(HSW_ROW_CHICKEN3,
> -                
> _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
> +     if (IS_HSW_GT3(dev) && dev->pdev->revision <= 6) {
> +             /* L3 caching of data atomics doesn't work -- disable it. */
> +             I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
> +             I915_WRITE(HSW_ROW_CHICKEN3,
> +                        
> _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
> +     }
>  
>       /* This is required by WaCatErrorRejectionIssue:hsw */
>       I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG,
> -- 
> 1.8.3.2

Attachment: pgpZ1IsEKWPxH.pgp
Description: PGP signature

_______________________________________________
Intel-gfx mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to