On Fri, Aug 12, 2022 at 05:10:51PM -0400, Felix Kuehling wrote:
> On 2022-08-12 02:20, Dan Carpenter wrote:
> > This code has two bugs.  If kfd_topology_device_by_proximity_domain()
> > failed on the first iteration through the loop then "cpu_link" is
> > uninitialized and should not be dereferenced.
> > 
> > The second bug is that we cannot dereference a list iterator when it
> > points to the list head.  In other words, if we exit the
> > list_for_each_entry() loop exits without hitting a break then "cpu_link"
> > is not a valid pointer and should not be dereferenced.
> > 
> > Fix both of these problems by setting "cpu_link" to NULL when it is invalid
> > and non-NULL when it is valid.  That makes it easier to test for
> > valid vs invalid.
> > 
> > Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface 
> > peer-to-peer links")
> > Signed-off-by: Dan Carpenter <dan.carpen...@oracle.com>
> > ---
> > I reported these in June but never heard back.
> 
> I thought Ramesh implemented a fix for this:
> https://lore.kernel.org/all/20220706183302.1719795-1-ramesh.errab...@amd.com/
> 
> You commented on a version of his patch:
> https://lore.kernel.org/all/20220629161241.GM11460@kadam/

Oh, Sorry!  I appologize for forgetting that.

> 
> Did this get lost somehow? Anyway, your patch looks good to me and I'm going
> to apply it to amd-staging-drm-next now.
> 
> Reviewed-by: Felix Kuehling <felix.kuehl...@amd.com>

Looking at Ramesh's patch now, he added a continue if
kfd_topology_device_by_proximity_domain() failed.  That is a behavior
change, but it might also be a bug fix.

My patch does not change the behavior except for eliminating the crash
so I stand by my patch, but adding the continue might be a good thing to
do as a separate step.  I don't know the code well enough to say one way
or the other.

regards,
dan carpenter

Reply via email to