[AMD Public Use]

We need to figure out what the root cause is then.  If we can't figure it out 
soon, we should revert the change for navi1x and continue to debug it until we 
can find the root cause and we can safely re-enable it.

Alex
________________________________
From: Chen, Guchun <[email protected]>
Sent: Sunday, November 29, 2020 2:22 AM
To: Bas Nieuwenhuizen <[email protected]>; Kuehling, Felix 
<[email protected]>
Cc: Gui, Jack <[email protected]>; Zhou1, Tao <[email protected]>; amd-gfx 
mailing list <[email protected]>; Huang, Ray <[email protected]>; 
Deucher, Alexander <[email protected]>; Zhang, Hawking 
<[email protected]>
Subject: RE: [PATCH v3] drm/amd/amdgpu: set the default value of noretry to 1 
for some dGPUs

[AMD Public Use]

Hi Bas Nieuwenhuizen,

I don't think direct revert is one right approach, though it's able to fix your 
problem.  noretry=0 will cause other test failure on several ASICs.

Regards,
Guchun

-----Original Message-----
From: amd-gfx <[email protected]> On Behalf Of Bas 
Nieuwenhuizen
Sent: Sunday, November 29, 2020 8:38 AM
To: Kuehling, Felix <[email protected]>
Cc: Gui, Jack <[email protected]>; Chen, Guchun <[email protected]>; Zhou1, 
Tao <[email protected]>; amd-gfx mailing list <[email protected]>; 
Huang, Ray <[email protected]>; Deucher, Alexander <[email protected]>; 
Zhang, Hawking <[email protected]>
Subject: Re: [PATCH v3] drm/amd/amdgpu: set the default value of noretry to 1 
for some dGPUs

Can we revert this patch to fix
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1374&amp;data=04%7C01%7Cguchun.chen%40amd.com%7C6d626e2a3bae4877024f08d893ff15db%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637422071085800476%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Jxa2V1TuszoBKtF%2FPbIA3YwOrXHgLreBY%2FXej1HTZ4k%3D&amp;reserved=0
 ?

On Thu, Oct 15, 2020 at 4:30 PM Felix Kuehling <[email protected]> wrote:
>
> Am 2020-10-14 um 11:35 p.m. schrieb Chengming Gui:
> > noretry = 0 cause some dGPU's kfd page fault tests fail, so set
> > noretry to 1 for these special ASICs:
> > vega20/navi10/navi14/ARCTURUS
> >
> > v2: merge raven and default case due to the same setting
> > v3: remove ARCTURUS
> >
> > Signed-off-by: Chengming Gui <[email protected]>
> > Change-Id: I3be70f463a49b0cd5c56456431d6c2cb98b13872
>
> Acked-by: Felix Kuhling <[email protected]>
>
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 23
> > +++++++++++++++--------
> >  1 file changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > index 36604d751d62..f26eb4e54b12 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > @@ -425,20 +425,27 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device 
> > *adev)
> >       struct amdgpu_gmc *gmc = &adev->gmc;
> >
> >       switch (adev->asic_type) {
> > -     case CHIP_RAVEN:
> > -             /* Raven currently has issues with noretry
> > -              * regardless of what we decide for other
> > -              * asics, we should leave raven with
> > -              * noretry = 0 until we root cause the
> > -              * issues.
> > +     case CHIP_VEGA20:
> > +     case CHIP_NAVI10:
> > +     case CHIP_NAVI14:
> > +             /*
> > +              * noretry = 0 will cause kfd page fault tests fail
> > +              * for some ASICs, so set default to 1 for these ASICs.
> >                */
> >               if (amdgpu_noretry == -1)
> > -                     gmc->noretry = 0;
> > +                     gmc->noretry = 1;
> >               else
> >                       gmc->noretry = amdgpu_noretry;
> >               break;
> > +     case CHIP_RAVEN:
> >       default:
> > -             /* default this to 0 for now, but we may want
> > +             /* Raven currently has issues with noretry
> > +              * regardless of what we decide for other
> > +              * asics, we should leave raven with
> > +              * noretry = 0 until we root cause the
> > +              * issues.
> > +              *
> > +              * default this to 0 for now, but we may want
> >                * to change this in the future for certain
> >                * GPUs as it can increase performance in
> >                * certain cases.
> _______________________________________________
> amd-gfx mailing list
> [email protected]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cgu
> chun.chen%40amd.com%7C6d626e2a3bae4877024f08d893ff15db%7C3dd8961fe4884
> e608e11a82d994e183d%7C0%7C0%7C637422071085800476%7CUnknown%7CTWFpbGZsb
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C1000&amp;sdata=VFqegGwPCj10q3Y5BdZsVq2a%2B4Tb358mYVDaNkA9zLU%3D&amp;
> reserved=0
_______________________________________________
amd-gfx mailing list
[email protected]
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cguchun.chen%40amd.com%7C6d626e2a3bae4877024f08d893ff15db%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637422071085800476%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=VFqegGwPCj10q3Y5BdZsVq2a%2B4Tb358mYVDaNkA9zLU%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to