[AMD Official Use Only - AMD Internal Distribution Only]
Hi Lijo,
Just ran some tests and yes, I think we should still reflect everything. Even
with extended peer link command disabled, if we do not reflect, the table will
only be filled beneath the diagonal (e.g. attached). So I think the patch is
good as is for that concern.
Let me know if there's anything else you want me to look into.
Thanks,
Will
-----Original Message-----
From: Aitken, Will
Sent: Wednesday, October 29, 2025 9:16 AM
To: Lazar, Lijo <[email protected]>; [email protected]
Cc: Skvortsov, Victor <[email protected]>
Subject: RE: [PATCH 1/3] drm/amdgpu: Refactor sriov xgmi topology filling to
common code
Hi Lijo,
Thank for reviewing. Good question. I don't think it's necessary since it
wasn't being done before (the equivalent xgmi_fill function was only called if
sharing was enabled). I don't think it hurts to do it, but it might be
redundant (the port num copying at least would be zeros for the other nodes
since it does not receive extended data). Not sure about the
is_sharing_enabled. I will run some experiments and add checks to only reflect
if necessary.
Thanks,
Will
-----Original Message-----
From: Lazar, Lijo <[email protected]>
Sent: Tuesday, October 28, 2025 9:16 AM
To: Aitken, Will <[email protected]>; [email protected]
Cc: Skvortsov, Victor <[email protected]>; Aitken, Will
<[email protected]>
Subject: Re: [PATCH 1/3] drm/amdgpu: Refactor sriov xgmi topology filling to
common code
On 10/8/2025 7:02 PM, [email protected] wrote:
From: Will Aitken <[email protected]>
amdgpu_xgmi_fill_topology_info and psp_xgmi_reflect_topology_info
perform the same logic of copying topology info of one node to every
other node in the hive. Instead of having two functions that purport
to do the same thing, this refactoring moves the logic of the fill
function to the reflect function and adds reflecting port number info
as well for complete functionality.
Signed-off-by: Will Aitken <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 19 ++++++++++++-----
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 27 ------------------------
2 files changed, 14 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 6208a49c9f23..82500ade240d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1539,6 +1539,7 @@ static void psp_xgmi_reflect_topology_info(struct
psp_context *psp,
uint64_t src_node_id = psp->adev->gmc.xgmi.node_id;
uint64_t dst_node_id = node_info.node_id;
uint8_t dst_num_hops = node_info.num_hops;
+ uint8_t dst_is_sharing_enabled = node_info.is_sharing_enabled;
uint8_t dst_num_links = node_info.num_links;
hive = amdgpu_get_xgmi_hive(psp->adev); @@ -1558,13 +1559,20 @@
static void psp_xgmi_reflect_topology_info(struct psp_context *psp,
continue;
mirror_top_info->nodes[j].num_hops = dst_num_hops;
- /*
- * prevent 0 num_links value re-reflection since
reflection
+ mirror_top_info->nodes[j].is_sharing_enabled =
dst_is_sharing_enabled;
+ /* prevent 0 num_links value re-reflection since
reflection
* criteria is based on num_hops (direct or indirect).
- *
*/
- if (dst_num_links)
+ if (dst_num_links) {
Patches look fine. One clarification - do you need to fill this information if
sharing is disabled?
Thanks,
Lijo
mirror_top_info->nodes[j].num_links =
dst_num_links;
+ /* swap src and dst due to frame of reference */
+ for (int k = 0; k < dst_num_links; k++) {
+
mirror_top_info->nodes[j].port_num[k].src_xgmi_port_num =
+
node_info.port_num[k].dst_xgmi_port_num;
+
mirror_top_info->nodes[j].port_num[k].dst_xgmi_port_num =
+
node_info.port_num[k].src_xgmi_port_num;
+ }
+ }
break;
}
@@ -1639,7 +1647,8 @@ int psp_xgmi_get_topology_info(struct psp_context *psp,
amdgpu_ip_version(psp->adev, MP0_HWIP, 0) ==
IP_VERSION(13, 0, 6) ||
amdgpu_ip_version(psp->adev, MP0_HWIP, 0) ==
- IP_VERSION(13, 0, 14);
+ IP_VERSION(13, 0, 14) ||
+ amdgpu_sriov_vf(psp->adev);
bool ta_port_num_support = amdgpu_sriov_vf(psp->adev) ? 0 :
psp->xgmi_context.xgmi_ta_caps &
EXTEND_PEER_LINK_INFO_CMD_FLAG;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 1ede308a7c67..2e70b84a8c3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -958,28 +958,6 @@ static int
amdgpu_xgmi_initialize_hive_get_data_partition(struct amdgpu_hive_inf
return 0;
}
-static void amdgpu_xgmi_fill_topology_info(struct amdgpu_device *adev,
- struct amdgpu_device *peer_adev)
-{
- struct psp_xgmi_topology_info *top_info =
&adev->psp.xgmi_context.top_info;
- struct psp_xgmi_topology_info *peer_info =
&peer_adev->psp.xgmi_context.top_info;
-
- for (int i = 0; i < peer_info->num_nodes; i++) {
- if (peer_info->nodes[i].node_id == adev->gmc.xgmi.node_id) {
- for (int j = 0; j < top_info->num_nodes; j++) {
- if (top_info->nodes[j].node_id ==
peer_adev->gmc.xgmi.node_id) {
- peer_info->nodes[i].num_hops =
top_info->nodes[j].num_hops;
- peer_info->nodes[i].is_sharing_enabled =
-
top_info->nodes[j].is_sharing_enabled;
- peer_info->nodes[i].num_links =
-
top_info->nodes[j].num_links;
- return;
- }
- }
- }
- }
-}
-
int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
{
struct psp_xgmi_topology_info *top_info; @@ -1065,11 +1043,6 @@ int
amdgpu_xgmi_add_device(struct amdgpu_device *adev)
/* To do: continue with some node failed or
disable the whole hive*/
goto exit_unlock;
}
-
- /* fill the topology info for peers instead of getting
from PSP */
- list_for_each_entry(tmp_adev, &hive->device_list,
gmc.xgmi.head) {
- amdgpu_xgmi_fill_topology_info(adev, tmp_adev);
- }
} else {
/* get latest topology info for each device from psp */
list_for_each_entry(tmp_adev, &hive->device_list,
gmc.xgmi.head)
{