AMD General Best Regards, Thomas -----Original Message----- From: amd-gfx <[email protected]> On Behalf Of Chenglei Xie Sent: Friday, May 8, 2026 4:29 AM To: [email protected] Cc: Chan, Hing Pong <[email protected]>; Luo, Zhigang <[email protected]>; Deucher, Alexander <[email protected]>; Xie, Chenglei <[email protected]> Subject: [PATCH] drm/amdgpu: fix OOB risk parsing virt RAS batch trace replies on the VF
The VF copied ras_cmd_batch_trace_record_rsp from shared memory without fully constraining real_batch_num, the cache window, or per-batch offset/trace_num. A hostile or corrupted buffer could make batch_id - start_batch_id index past batchs[], make start_batch_id + real_batch_num wrap in uint64_t and confuse the refetch logic, or make offset+trace_num walk past records[]. Validate the response like the PF path. single helper for the cache window using subtraction and a real_batch_num cap, bounds on trace_num and offset+trace_num, re-check after the RPC, and memset the cache when invalid. Return -EIO for bad layout and -ENODATA for a batch_id slot mismatch. Signed-off-by: Chenglei Xie <[email protected]> Change-Id: I6455e9f14914d1b07945b7a57fcb3695435ded64 --- .../drm/amd/ras/ras_mgr/amdgpu_virt_ras_cmd.c | 38 ++++++++++++++----- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/ras/ras_mgr/amdgpu_virt_ras_cmd.c b/drivers/gpu/drm/amd/ras/ras_mgr/amdgpu_virt_ras_cmd.c index 838eb91aef391..fcb421d39f87d 100644 --- a/drivers/gpu/drm/amd/ras/ras_mgr/amdgpu_virt_ras_cmd.c +++ b/drivers/gpu/drm/amd/ras/ras_mgr/amdgpu_virt_ras_cmd.c @@ -192,6 +192,16 @@ static int amdgpu_virt_ras_get_cper_snapshot(struct ras_core_context *ras_core, return RAS_CMD__SUCCESS; } +/* rsp contents are copied from shared memory; validate before +indexing. */ static bool amdgpu_virt_ras_batch_trace_rsp_covers(struct ras_cmd_batch_trace_record_rsp *rsp, + uint64_t batch_id) +{ + return rsp->real_batch_num && + rsp->real_batch_num <= RAS_CMD_MAX_BATCH_NUM && + batch_id >= rsp->start_batch_id && + (batch_id - rsp->start_batch_id) < rsp->real_batch_num; } + [Thomas] This function is checking whether the batch_id has already been cached, suggest renaming it to amdgpu_virt_ras_check_batch_cached. static int amdgpu_virt_ras_get_batch_records(struct ras_core_context *ras_core, uint64_t batch_id, struct ras_log_info **trace_arr, uint32_t arr_num, struct ras_cmd_batch_trace_record_rsp *rsp_cache) @@ -204,27 +214,37 @@ static int amdgpu_virt_ras_get_batch_records(struct ras_core_context *ras_core, struct batch_ras_trace_info *batch; int ret = 0; uint32_t i; + uint32_t idx; - if (!rsp->real_batch_num || (batch_id < rsp->start_batch_id) || - (batch_id >= (rsp->start_batch_id + rsp->real_batch_num))) { - + if (!amdgpu_virt_ras_batch_trace_rsp_covers(rsp, batch_id)) { memset(rsp, 0, sizeof(*rsp)); ret = amdgpu_virt_ras_send_remote_cmd(ras_core, RAS_CMD__GET_BATCH_TRACE_RECORD, &req, sizeof(req), rsp, sizeof(*rsp)); if (ret) return -EPIPE; + + if (!amdgpu_virt_ras_batch_trace_rsp_covers(rsp, batch_id)) { + memset(rsp, 0, sizeof(*rsp)); + return -EIO; + } } - batch = &rsp->batchs[batch_id - rsp->start_batch_id]; - if (batch_id != batch->batch_id) + idx = (uint32_t)(batch_id - rsp->start_batch_id); + batch = &rsp->batchs[idx]; + if (batch_id != batch->batch_id) { + memset(rsp, 0, sizeof(*rsp)); return -ENODATA; + } - for (i = 0; i < batch->trace_num; i++) { - if (i >= arr_num) - break; - trace_arr[i] = &rsp->records[batch->offset + i]; + if (batch->trace_num > MAX_RECORD_PER_BATCH || + (uint32_t)batch->offset + batch->trace_num > RAS_CMD_MAX_TRACE_NUM) { + memset(rsp, 0, sizeof(*rsp)); + return -EIO; } [Thomas] The batch data validation here could be consolidated with the earlier "if (batch_id != batch->batch_id)" check above, to keep the batch-related checks together and improve readability. + for (i = 0; i < batch->trace_num && i < arr_num; i++) + trace_arr[i] = &rsp->records[batch->offset + i]; + return i; } -- 2.34.1
