Hi all,
While auditing list_last_entry callsites, I noticed two places in
drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c where the developer
wrote a NULL check for an empty list but used the unsafe API. The
check is dead code. I would appreciate it if you could take a
look and let me know whether these are worth fixing.
The two sites are amdgpu_ring_mux_ib_mark_offset() and
amdgpu_ring_mux_end_ib() (linux-7.1-rc1, around lines 497 and
530):
chunk = list_last_entry(&e->list, struct amdgpu_mux_chunk, entry);
if (!chunk) {
DRM_ERROR("cannot find chunk!\n");
return;
}
list_last_entry() returns container_of(&e->list, struct
amdgpu_mux_chunk, entry) when e->list is empty, never NULL. The
"cannot find chunk!" error path is dead code.
With an empty e->list, the fall through pointer aliases &e->list
inside struct amdgpu_mux_entry. The writes that follow then
corrupt fields of the mux_entry at the corresponding offsets.
mark_offset writes cntl_offset, de_offset and ce_offset. end_ib
writes end and sync_seq.
e->list is empty if a software ring submits an IB mark or IB end
before any chunk is queued for that ring. This can happen on a
fresh start_ib path, or after end_ib drops the last chunk.
A candidate fix is a one liner per site. Switch to
list_last_entry_or_null so the existing error path runs.
Similar dead empty checks after list_first_entry / list_last_entry
have been cleaned up in the same shape, for example commit
fbb8bc408027 (net: qed: Remove redundant NULL checks after
list_first_entry), commit c708d3fad421 (crypto: atmel: use
list_first_entry_or_null to simplify find_dev) and commit
10379171f346 (ksmbd: use list_first_entry_or_null for
opinfo_get_list). The qed commit message describes the exact
shape we observe here. These two sites appear to be missed by
those cleanups.
If this is intentional or already known, please disregard.
Otherwise I am happy to send a [PATCH] or to leave the fix to you.
Thanks,
Maoyi Xie
https://maoyixie.com/