AMD General

Best Regards,
Thomas
-----Original Message-----
From: amd-gfx <[email protected]> On Behalf Of Chenglei Xie
Sent: Friday, May 8, 2026 4:29 AM
To: [email protected]
Cc: Chan, Hing Pong <[email protected]>; Luo, Zhigang <[email protected]>; 
Deucher, Alexander <[email protected]>; Xie, Chenglei 
<[email protected]>
Subject: [PATCH] drm/amdgpu: fix OOB risk parsing virt RAS batch trace replies 
on the VF

The VF copied ras_cmd_batch_trace_record_rsp from shared memory without fully 
constraining real_batch_num, the cache window, or per-batch offset/trace_num. A 
hostile or corrupted buffer could make batch_id - start_batch_id index past 
batchs[], make start_batch_id + real_batch_num wrap in uint64_t and confuse the 
refetch logic, or make
offset+trace_num walk past records[].

Validate the response like the PF path. single helper for the cache window 
using subtraction and a real_batch_num cap, bounds on trace_num and 
offset+trace_num, re-check after the RPC, and memset the cache when invalid. 
Return -EIO for bad layout and -ENODATA for a batch_id slot mismatch.

Signed-off-by: Chenglei Xie <[email protected]>
Change-Id: I6455e9f14914d1b07945b7a57fcb3695435ded64
---
 .../drm/amd/ras/ras_mgr/amdgpu_virt_ras_cmd.c | 38 ++++++++++++++-----
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/ras/ras_mgr/amdgpu_virt_ras_cmd.c 
b/drivers/gpu/drm/amd/ras/ras_mgr/amdgpu_virt_ras_cmd.c
index 838eb91aef391..fcb421d39f87d 100644
--- a/drivers/gpu/drm/amd/ras/ras_mgr/amdgpu_virt_ras_cmd.c
+++ b/drivers/gpu/drm/amd/ras/ras_mgr/amdgpu_virt_ras_cmd.c
@@ -192,6 +192,16 @@ static int amdgpu_virt_ras_get_cper_snapshot(struct 
ras_core_context *ras_core,
        return RAS_CMD__SUCCESS;
 }

+/* rsp contents are copied from shared memory; validate before
+indexing. */ static bool amdgpu_virt_ras_batch_trace_rsp_covers(struct 
ras_cmd_batch_trace_record_rsp *rsp,
+                                      uint64_t batch_id)
+{
+       return rsp->real_batch_num &&
+              rsp->real_batch_num <= RAS_CMD_MAX_BATCH_NUM &&
+              batch_id >= rsp->start_batch_id &&
+              (batch_id - rsp->start_batch_id) < rsp->real_batch_num; }
+

[Thomas]  This function is checking whether the batch_id has already been 
cached, suggest renaming it to amdgpu_virt_ras_check_batch_cached.

 static int amdgpu_virt_ras_get_batch_records(struct ras_core_context 
*ras_core, uint64_t batch_id,
                        struct ras_log_info **trace_arr, uint32_t arr_num,
                        struct ras_cmd_batch_trace_record_rsp *rsp_cache) @@ 
-204,27 +214,37 @@ static int amdgpu_virt_ras_get_batch_records(struct 
ras_core_context *ras_core,
        struct batch_ras_trace_info *batch;
        int ret = 0;
        uint32_t i;
+       uint32_t idx;

-       if (!rsp->real_batch_num || (batch_id < rsp->start_batch_id) ||
-               (batch_id >=  (rsp->start_batch_id + rsp->real_batch_num))) {
-
+       if (!amdgpu_virt_ras_batch_trace_rsp_covers(rsp, batch_id)) {
                memset(rsp, 0, sizeof(*rsp));
                ret = amdgpu_virt_ras_send_remote_cmd(ras_core, 
RAS_CMD__GET_BATCH_TRACE_RECORD,
                        &req, sizeof(req), rsp, sizeof(*rsp));
                if (ret)
                        return -EPIPE;
+
+               if (!amdgpu_virt_ras_batch_trace_rsp_covers(rsp, batch_id)) {
+                       memset(rsp, 0, sizeof(*rsp));
+                       return -EIO;
+               }
        }

-       batch = &rsp->batchs[batch_id - rsp->start_batch_id];
-       if (batch_id != batch->batch_id)
+       idx = (uint32_t)(batch_id - rsp->start_batch_id);
+       batch = &rsp->batchs[idx];
+       if (batch_id != batch->batch_id) {
+               memset(rsp, 0, sizeof(*rsp));
                return -ENODATA;
+       }

-       for (i = 0; i < batch->trace_num; i++) {
-               if (i >= arr_num)
-                       break;
-               trace_arr[i] = &rsp->records[batch->offset + i];
+       if (batch->trace_num > MAX_RECORD_PER_BATCH ||
+           (uint32_t)batch->offset + batch->trace_num > RAS_CMD_MAX_TRACE_NUM) 
{
+               memset(rsp, 0, sizeof(*rsp));
+               return -EIO;
        }

[Thomas] The batch data validation here could be consolidated with the earlier 
"if (batch_id != batch->batch_id)" check above, to keep the batch-related 
checks together and improve readability.

+       for (i = 0; i < batch->trace_num && i < arr_num; i++)
+               trace_arr[i] = &rsp->records[batch->offset + i];
+
        return i;
 }

--
2.34.1

Reply via email to