From the device's perspective, during a single read of the descriptor list-and that list spans across cache lines. the retrieved data will show `desc[head].flags` as valid, and `desc[i].flags` as valid as well; however, the `desc[i].addr` and `len` fields may be invalid.
I suspect this occurs because a single DMA operation is internally split into multiple parallel read transactions, aligned to cache line boundaries. I apologize that I currently lack the necessary environment to verify whether this modification definitively resolves the issue or merely reduces the probability of its occurrence; therefore, this patch can be discarded. yangjiale At 2026-06-02 14:04:13, "Eugenio Perez Martin" <[email protected]> wrote: >On Tue, Jun 2, 2026 at 6:34 AM yangjiale <[email protected]> wrote: >> >> When a descriptor list spans across cache lines, >> updating the flag first can lead to a scenario where the device side >> perceives the flag as valid, yet the corresponding address and length >> fields remain unupdated—resulting in invalid values. >> Therefore, the flag field must be updated last. >> >> Signed-off-by: yangjiale <[email protected]> >> --- >> drivers/virtio/virtio_ring.c | 8 ++++---- >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c >> index fbca7ce1c6bf..036b4f90d30f 100644 >> --- a/drivers/virtio/virtio_ring.c >> +++ b/drivers/virtio/virtio_ring.c >> @@ -1688,6 +1688,10 @@ static inline int virtqueue_add_packed(struct >> vring_virtqueue *vq, >> &addr, &len, premapped, attr)) >> goto unmap_release; >> >> + desc[i].addr = cpu_to_le64(addr); >> + desc[i].len = cpu_to_le32(len); >> + desc[i].id = cpu_to_le16(id); >> + >> flags = cpu_to_le16(vq->packed.avail_used_flags | >> (++c == total_sg ? 0 : >> VRING_DESC_F_NEXT) | >> (n < out_sgs ? 0 : VRING_DESC_F_WRITE)); >> @@ -1696,10 +1700,6 @@ static inline int virtqueue_add_packed(struct >> vring_virtqueue *vq, >> else >> desc[i].flags = flags; >> >> - desc[i].addr = cpu_to_le64(addr); >> - desc[i].len = cpu_to_le32(len); >> - desc[i].id = cpu_to_le16(id); >> - >> if (unlikely(vq->use_map_api)) { >> vq->packed.desc_extra[curr].addr = premapped >> ? >> DMA_MAPPING_ERROR : addr; > >These flags are updated before the flags of the head descriptor at the >end of the function, at "vq->packed.vring.desc[head].flags = >head_flags", so the device should not see these. Because of that, the >relative order between the rest of the fields of the same descriptor >or other descriptors' fields, except for the head descriptor's flags, >should not matter. There is a write memory barrier just before >updating the head's flags. > >Also, I don't get why the cache line matters here. Can you expand? Am >I missing something?

