On Fri, Jan 21, 2022 at 03:28:33PM +0100, Alexander Graf wrote: > > On 19.01.22 19:45, Kevin O'Connor wrote: > > Commit 01f2736cc905d ("nvme: Pass large I/O requests as PRP lists") > > introduced multi-page requests using the NVMe PRP mechanism. To store the > > list and "first page to write to" hints, it added fields to the NVMe > > namespace struct. > > > > Unfortunately, that struct resides in fseg which is read-only at runtime. > > While KVM ignores the read-only part and allows writes, real hardware and > > TCG adhere to the semantics and ignore writes to the fseg region. The net > > effect of that is that reads and writes were always happening on address 0, > > unless they went through the bounce buffer logic. > > > > This patch builds the PRP maintenance data in the existing "dma bounce > > buffer" and only builds it when needed. > > > > Fixes: 01f2736cc905d ("nvme: Pass large I/O requests as PRP lists") > > Reported-by: Matt DeVillier <matt.devill...@gmail.com> > > Signed-off-by: Alexander Graf <g...@amazon.com> > > Signed-off-by: Kevin O'Connor <ke...@koconnor.net> > > --- > > src/hw/nvme-int.h | 6 ------ > > src/hw/nvme.c | 51 +++++++++++++++++------------------------------ > > 2 files changed, 18 insertions(+), 39 deletions(-) > > > > diff --git a/src/hw/nvme-int.h b/src/hw/nvme-int.h > > index 9564c17..f0d717d 100644 > > --- a/src/hw/nvme-int.h > > +++ b/src/hw/nvme-int.h > > @@ -10,8 +10,6 @@ > > #include "types.h" // u32 > > #include "pcidevice.h" // struct pci_device > > > > -#define NVME_MAX_PRPL_ENTRIES 15 /* Allows requests up to 64kb */ > > - > > /* Data structures */ > > > > /* The register file of a NVMe host controller. This struct follows the > > naming > > @@ -122,10 +120,6 @@ struct nvme_namespace { > > > > /* Page aligned buffer of size NVME_PAGE_SIZE. */ > > char *dma_buffer; > > - > > - /* Page List */ > > - u32 prpl_len; > > - u64 prpl[NVME_MAX_PRPL_ENTRIES]; > > }; > > > > /* Data structures for NVMe admin identify commands */ > > diff --git a/src/hw/nvme.c b/src/hw/nvme.c > > index 3a73784..39b9138 100644 > > --- a/src/hw/nvme.c > > +++ b/src/hw/nvme.c > > @@ -470,35 +470,19 @@ nvme_bounce_xfer(struct nvme_namespace *ns, u64 lba, > > void *buf, u16 count, > > return res; > > } > > > > -static void nvme_reset_prpl(struct nvme_namespace *ns) > > -{ > > - ns->prpl_len = 0; > > -} > > - > > -static int nvme_add_prpl(struct nvme_namespace *ns, u64 base) > > -{ > > - if (ns->prpl_len >= NVME_MAX_PRPL_ENTRIES) > > - return -1; > > - > > - ns->prpl[ns->prpl_len++] = base; > > - > > - return 0; > > -} > > +#define NVME_MAX_PRPL_ENTRIES 15 /* Allows requests up to 64kb */ > > > > // Transfer data using page list (if applicable) > > static int > > nvme_prpl_xfer(struct nvme_namespace *ns, u64 lba, void *buf, u16 count, > > int write) > > { > > - int first_page = 1; > > u32 base = (long)buf; > > s32 size; > > > > if (count > ns->max_req_size) > > count = ns->max_req_size; > > > > - nvme_reset_prpl(ns); > > - > > size = count * ns->block_size; > > /* Special case for transfers that fit into PRP1, but are unaligned */ > > if (((size + (base & ~NVME_PAGE_MASK)) <= NVME_PAGE_SIZE)) > > @@ -512,28 +496,29 @@ nvme_prpl_xfer(struct nvme_namespace *ns, u64 lba, > > void *buf, u16 count, > > if (size & (ns->block_size - 1ULL)) > > return nvme_bounce_xfer(ns, lba, buf, count, write); > > > > - for (; size > 0; base += NVME_PAGE_SIZE, size -= NVME_PAGE_SIZE) { > > - if (first_page) { > > - /* First page is special */ > > - first_page = 0; > > - continue; > > - } > > - if (nvme_add_prpl(ns, base)) > > - return nvme_bounce_xfer(ns, lba, buf, count, write); > > - } > > - > > - void *prp2; > > if ((ns->block_size * count) > (NVME_PAGE_SIZE * 2)) { > > - /* We need to describe more than 2 pages, rely on PRP List */ > > - prp2 = ns->prpl; > > + /* We need to describe more than 2 pages, build PRP List */ > > + u32 prpl_len = 0; > > + u64 *prpl = (void*)ns->dma_buffer; > > > At 15*8=120 bytes of data, do we even need to put the prpl into dma_buffer > or can we just use the stack? Will the array be guaranteed 64bit aligned or > would we need to add an attribute to it? > > u64 prpl[NVME_MAX_PRPL_ENTRIES]; > > Either way, I don't have strong feelings one way or another. It just seems > more natural to keep the bounce buffer purely as bounce buffer :).
In general it's possible to DMA from the stack (eg, src/hw/ohci.c:ohci_send_pipe() ). However, SeaBIOS doesn't guarantee stack alignment so it would need to be done manually. Also, I'm not sure how tight the nvme request completion code is - if it returns early for some reason it could cause havoc (device dma into random memory). Another option might be to make a single global prpl array, but that would consume the memory even if nvme isn't in use. FWIW though, I don't see a harm in the single ( u64 *prpl = (void*)ns->dma_buffer ) line. Thanks, -Kevin > > > Alex > > > > + int first_page = 1; > > + for (; size > 0; base += NVME_PAGE_SIZE, size -= NVME_PAGE_SIZE) { > > + if (first_page) { > > + /* First page is special */ > > + first_page = 0; > > + continue; > > + } > > + if (prpl_len >= NVME_MAX_PRPL_ENTRIES) > > + return nvme_bounce_xfer(ns, lba, buf, count, write); > > + prpl[prpl_len++] = base; > > + } > > + return nvme_io_xfer(ns, lba, buf, prpl, count, write); > > } else if ((ns->block_size * count) > NVME_PAGE_SIZE) { > > /* Directly embed the 2nd page if we only need 2 pages */ > > - prp2 = (void *)(long)ns->prpl[0]; > > + return nvme_io_xfer(ns, lba, buf, buf + NVME_PAGE_SIZE, count, > > write); > > } else { > > /* One page is enough, don't expose anything else */ > > - prp2 = NULL; > > + return nvme_io_xfer(ns, lba, buf, NULL, count, write); > > } > > - return nvme_io_xfer(ns, lba, buf, prp2, count, write); > > } > > > > static int > > -- > > 2.31.1 > > > > > > Amazon Development Center Germany GmbH > Krausenstr. 38 > 10117 Berlin > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss > Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B > Sitz: Berlin > Ust-ID: DE 289 237 879 > > _______________________________________________ SeaBIOS mailing list -- seabios@seabios.org To unsubscribe send an email to seabios-le...@seabios.org