[SeaBIOS] Re: [PATCH 5/5] nvme: Build the page list in the existing dma buffer

Kevin O'Connor Fri, 21 Jan 2022 08:03:04 -0800

On Fri, Jan 21, 2022 at 03:28:33PM +0100, Alexander Graf wrote:
> 
> On 19.01.22 19:45, Kevin O'Connor wrote:
> > Commit 01f2736cc905d ("nvme: Pass large I/O requests as PRP lists")
> > introduced multi-page requests using the NVMe PRP mechanism. To store the
> > list and "first page to write to" hints, it added fields to the NVMe
> > namespace struct.
> > 
> > Unfortunately, that struct resides in fseg which is read-only at runtime.
> > While KVM ignores the read-only part and allows writes, real hardware and
> > TCG adhere to the semantics and ignore writes to the fseg region. The net
> > effect of that is that reads and writes were always happening on address 0,
> > unless they went through the bounce buffer logic.
> > 
> > This patch builds the PRP maintenance data in the existing "dma bounce
> > buffer" and only builds it when needed.
> > 
> > Fixes: 01f2736cc905d ("nvme: Pass large I/O requests as PRP lists")
> > Reported-by: Matt DeVillier <matt.devill...@gmail.com>
> > Signed-off-by: Alexander Graf <g...@amazon.com>
> > Signed-off-by: Kevin O'Connor <ke...@koconnor.net>
> > ---
> >   src/hw/nvme-int.h |  6 ------
> >   src/hw/nvme.c     | 51 +++++++++++++++++------------------------------
> >   2 files changed, 18 insertions(+), 39 deletions(-)
> > 
> > diff --git a/src/hw/nvme-int.h b/src/hw/nvme-int.h
> > index 9564c17..f0d717d 100644
> > --- a/src/hw/nvme-int.h
> > +++ b/src/hw/nvme-int.h
> > @@ -10,8 +10,6 @@
> >   #include "types.h" // u32
> >   #include "pcidevice.h" // struct pci_device
> > 
> > -#define NVME_MAX_PRPL_ENTRIES 15 /* Allows requests up to 64kb */
> > -
> >   /* Data structures */
> > 
> >   /* The register file of a NVMe host controller. This struct follows the 
> > naming
> > @@ -122,10 +120,6 @@ struct nvme_namespace {
> > 
> >       /* Page aligned buffer of size NVME_PAGE_SIZE. */
> >       char *dma_buffer;
> > -
> > -    /* Page List */
> > -    u32 prpl_len;
> > -    u64 prpl[NVME_MAX_PRPL_ENTRIES];
> >   };
> > 
> >   /* Data structures for NVMe admin identify commands */
> > diff --git a/src/hw/nvme.c b/src/hw/nvme.c
> > index 3a73784..39b9138 100644
> > --- a/src/hw/nvme.c
> > +++ b/src/hw/nvme.c
> > @@ -470,35 +470,19 @@ nvme_bounce_xfer(struct nvme_namespace *ns, u64 lba, 
> > void *buf, u16 count,
> >       return res;
> >   }
> > 
> > -static void nvme_reset_prpl(struct nvme_namespace *ns)
> > -{
> > -    ns->prpl_len = 0;
> > -}
> > -
> > -static int nvme_add_prpl(struct nvme_namespace *ns, u64 base)
> > -{
> > -    if (ns->prpl_len >= NVME_MAX_PRPL_ENTRIES)
> > -        return -1;
> > -
> > -    ns->prpl[ns->prpl_len++] = base;
> > -
> > -    return 0;
> > -}
> > +#define NVME_MAX_PRPL_ENTRIES 15 /* Allows requests up to 64kb */
> > 
> >   // Transfer data using page list (if applicable)
> >   static int
> >   nvme_prpl_xfer(struct nvme_namespace *ns, u64 lba, void *buf, u16 count,
> >                  int write)
> >   {
> > -    int first_page = 1;
> >       u32 base = (long)buf;
> >       s32 size;
> > 
> >       if (count > ns->max_req_size)
> >           count = ns->max_req_size;
> > 
> > -    nvme_reset_prpl(ns);
> > -
> >       size = count * ns->block_size;
> >       /* Special case for transfers that fit into PRP1, but are unaligned */
> >       if (((size + (base & ~NVME_PAGE_MASK)) <= NVME_PAGE_SIZE))
> > @@ -512,28 +496,29 @@ nvme_prpl_xfer(struct nvme_namespace *ns, u64 lba, 
> > void *buf, u16 count,
> >       if (size & (ns->block_size - 1ULL))
> >           return nvme_bounce_xfer(ns, lba, buf, count, write);
> > 
> > -    for (; size > 0; base += NVME_PAGE_SIZE, size -= NVME_PAGE_SIZE) {
> > -        if (first_page) {
> > -            /* First page is special */
> > -            first_page = 0;
> > -            continue;
> > -        }
> > -        if (nvme_add_prpl(ns, base))
> > -            return nvme_bounce_xfer(ns, lba, buf, count, write);
> > -    }
> > -
> > -    void *prp2;
> >       if ((ns->block_size * count) > (NVME_PAGE_SIZE * 2)) {
> > -        /* We need to describe more than 2 pages, rely on PRP List */
> > -        prp2 = ns->prpl;
> > +        /* We need to describe more than 2 pages, build PRP List */
> > +        u32 prpl_len = 0;
> > +        u64 *prpl = (void*)ns->dma_buffer;
> 
> 
> At 15*8=120 bytes of data, do we even need to put the prpl into dma_buffer
> or can we just use the stack? Will the array be guaranteed 64bit aligned or
> would we need to add an attribute to it?
> 
>     u64 prpl[NVME_MAX_PRPL_ENTRIES];
> 
> Either way, I don't have strong feelings one way or another. It just seems
> more natural to keep the bounce buffer purely as bounce buffer :).


In general it's possible to DMA from the stack (eg,
src/hw/ohci.c:ohci_send_pipe() ).  However, SeaBIOS doesn't guarantee
stack alignment so it would need to be done manually.  Also, I'm not
sure how tight the nvme request completion code is - if it returns
early for some reason it could cause havoc (device dma into random
memory).

Another option might be to make a single global prpl array, but that
would consume the memory even if nvme isn't in use.

FWIW though, I don't see a harm in the single ( u64 *prpl =
(void*)ns->dma_buffer ) line.

Thanks,
-Kevin


> 
> 
> Alex
> 
> 
> > +        int first_page = 1;
> > +        for (; size > 0; base += NVME_PAGE_SIZE, size -= NVME_PAGE_SIZE) {
> > +            if (first_page) {
> > +                /* First page is special */
> > +                first_page = 0;
> > +                continue;
> > +            }
> > +            if (prpl_len >= NVME_MAX_PRPL_ENTRIES)
> > +                return nvme_bounce_xfer(ns, lba, buf, count, write);
> > +            prpl[prpl_len++] = base;
> > +        }
> > +        return nvme_io_xfer(ns, lba, buf, prpl, count, write);
> >       } else if ((ns->block_size * count) > NVME_PAGE_SIZE) {
> >           /* Directly embed the 2nd page if we only need 2 pages */
> > -        prp2 = (void *)(long)ns->prpl[0];
> > +        return nvme_io_xfer(ns, lba, buf, buf + NVME_PAGE_SIZE, count, 
> > write);
> >       } else {
> >           /* One page is enough, don't expose anything else */
> > -        prp2 = NULL;
> > +        return nvme_io_xfer(ns, lba, buf, NULL, count, write);
> >       }
> > -    return nvme_io_xfer(ns, lba, buf, prp2, count, write);
> >   }
> > 
> >   static int
> > --
> > 2.31.1
> > 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 
_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-le...@seabios.org

[SeaBIOS] Re: [PATCH 5/5] nvme: Build the page list in the existing dma buffer

Reply via email to