[Public] -----Original Message----- From: Kuehling, Felix <[email protected]> Sent: Saturday, October 22, 2022 4:53 AM To: Liang, Prike <[email protected]>; [email protected] Cc: Deucher, Alexander <[email protected]>; Zhang, Yifan <[email protected]>; Huang, Ray <[email protected]>; Liu, Aaron <[email protected]> Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for property asic
On 2022-10-21 09:05, Liang, Prike wrote: > [Public] > > -----Original Message----- > From: Kuehling, Felix <[email protected]> > Sent: Friday, October 21, 2022 1:11 PM > To: Liang, Prike <[email protected]>; [email protected] > Cc: Deucher, Alexander <[email protected]>; Zhang, Yifan > <[email protected]>; Huang, Ray <[email protected]>; Liu, Aaron > <[email protected]> > Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for > property asic > > Am 2022-10-20 um 21:50 schrieb Liang, Prike: >> [Public] >> >> -----Original Message----- >> From: Kuehling, Felix <[email protected]> >> Sent: Friday, October 21, 2022 12:03 AM >> To: Liang, Prike <[email protected]>; [email protected] >> Cc: Deucher, Alexander <[email protected]>; Zhang, Yifan >> <[email protected]>; Huang, Ray <[email protected]>; Liu, Aaron >> <[email protected]> >> Subject: Re: [PATCH 1/2] drm/amdkfd: introduce dummy cache info for >> property asic >> >> >> Am 2022-10-20 um 05:15 schrieb Prike Liang: >>> This dummy cache info will enable kfd base function support. >>> >>> Signed-off-by: Prike Liang <[email protected]> >>> --- >>> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 55 +++++++++++++++++++++++++-- >>> 1 file changed, 52 insertions(+), 3 deletions(-) >>> [snip] >>> @@ -1528,7 +1574,10 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev >>> *kdev, >>> >>> kfd_fill_gpu_cache_info_from_gfx_config(kdev, pcache_info); >>> break; >>> default: >>> - return -EINVAL; >>> + pcache_info = dummy_cache_info; >>> + num_of_cache_types = ARRAY_SIZE(dummy_cache_info); >>> + pr_warn("dummy cache info is used temporarily and >>> real cache info need update later.\n"); >>> + break; >> Could we make this return an error if the amdgpu.exp_hw_support module >> parameter is not set? >> >> Regards, >> Felix >> >> [Prike] It's fine to protect this dummy info by checking the parameter >> amdgpu_exp_hw_support, but it may not friendly to end user by adding the >> parameter and some guys will still report KFD not enabled for this parameter >> setting problem. The original idea is the end user will not aware the dummy >> cache info and only alert the warning message to developer. > I thought the intention was to simplify bring-up but make sure that valid > cache info is available by the time a chip goes into production. > Therefore, normal end users should never need to set the > amdgpu_exp_hw_support option. I think you're saying that we would go to > production with dummy info. That seems like a bad idea to me. > > Regards, > Felix > > [Prike] Sorry for the confusion. In fact, this dummy cache info only used > before production and meanwhile needn't require any parameter setting for CQE > do KFD test. Anyway if you still have concern on this solution I will add the > condition of checking amdgpu_exp_hw_support. The idea to control this with a module parameter was to cause a more obvious failure if we don't have correct cache info before going to production. Just a warning in the log file is too easy to miss or ignore. Of course, if QA gets into the habit of testing with amdgpu_exp_hw_support, then this may not solve the problem. You need to have at least one test pass without amdgpu_exp_hw_support to catch missing cache info. Regards, Felix Get your point. As to the KFD support on a NPI will be tracked by a ticket which make sure the real cache info update later after this dummy cache info assigned in the early BU phase. Thanks, Prike
