Re: [lng-odp] continuous memory allocation for drivers

Francois Ozog Fri, 11 Nov 2016 02:14:13 -0800

On 11 November 2016 at 10:10, Brian Brooks <brian.bro...@linaro.org> wrote:

> On 11/10 18:52:49, Christophe Milard wrote:
> > Hi,
> >
> > My hope was that packet segments would all be smaller than one page
> > (either normal pages or huge pages)
>
> When is this the case? With a 4096 byte page, a couple 1518 byte Ethernet

> packets can fit. A 9038 byte Jumbo wont fit.
>

[FF] WHen you allocate a queue with 256 packets for Intel, Vritio, Mellanox
cards, you need a small area of 256 descriptors that fit in a page. Then
drivers allocate 256 buffers in a contiguous memory. This leads to 512K of
buffers. They may allocate this zone per packet though. But for high
performance cards such as Chelsio and Netcope: this is a strict requirement
because the obtained memory zone is managed by HW: packet allocation is not
controlled by software. You give the zone to hardware wich places packets
the way it wants in the zone. HW informs the SW where the packets are by
updating the ring. VFIO does not change the requirement of a large
contiguous area.
PCIexpress has a limitation of 36M DMA transactions per second. Which is
lower than 60Mpps required for 40Gbps and much lower than 148Mpps required
for 100Gbps. The only way to achieve line rate is to fit more than one
packet in a DMA transaction. That's what Chelsio, Netcope and others are
doing. This requires HW controlled memory allocations. This requires large
memory blocks to be supplied to HW.
As we move forawrd, I expect all cards to adopt a similar scheme and escape
the "Intel" Model of IO.
Now if we look at performance, cost of managing virt_to_phys() even in
kernel for each packet is preventing spread allocations. You amortize the
cost by getting the physical address of the 256 buffer zone, and using
offsets from that to get the physical address of an individual packet. If
you try to do that in userland, then just use linux networking stack, it
will be faster ;-)

> Or is it to ease the memory manager by having a logical array of objects
> laid out in virtual memory space and depending on the number of objects
> and the size of each object, a few are bound to span across 2 pages which
> might not be adjacent in physical memory?
>
> Or is it when the view of a packet is a bunch of packet segments which
> may be of varying sizes and possibly scattered across in memory and the
> packet needs to go out the wire?
>
> Are 2M, 16M, 1G page size used?
>
> > to guarantee physical memory
> > continuity which is needed by some drivers (read non vfio drivers for
> > PCI).
>
> [FF] linux kernel uses a special allocator for that, huge pages are not
the unit. As said above, some HW require large contiguous blocks and vfio
or iommu does not avoid the requirement.

> If IOMMU enables IO device the same virtual addressing as the CPU by
> sharing page tables, would ever a IO device or IOMMU have limitations
> on the number of pages supported or other performance limitations
> during the VA->PA translation?
>
> [FF] no information on that

> Does the IOMMU remap interrupts from the IO device when the vm
> migrates cores? What happens when no irq remapping, does core get
> irq and must interprocessorinterrupt core where vm is now running?
>
> [FF] I hope not.

> Are non vfio drivers for PCI needing contiguous physical memory the
> design target?
>

[FF] not related to VFIO but related to HW requirements.

> > Francois Ozog's experience (with dpdk)shows that this hope will fail
> > in some case: not all platforms support the required huge page size.
> > And it would be nice to be able to run even in the absence of huge
> > pages.
> >
> > I am therefore planning to expand drvshm to include a flag requesting
> > contiguous physical memory. But sadly, from user space, this is
> > nothing we can guarantee... So when this flag is set, the allocator
> > will allocate untill physical memory "happens to be continuous".
> > This is a bit like the DPDK approach (try & error), which I dislike,
> > but there aren't many alternatives from user space. This would be
> > triggered only in case huge page allocation failed, or if the
> > requested size exceed the HP size.
>
> Are device drivers for the target devices (SoCs, cards) easier to
> program when there's an IOMMU? If so, is this contiguous physical
> memory requirement necessary?
>
> [FF] again, depends on what entity is managing packet placement.

> > Last alternative would be to have a kernel module to do this kind of
> > allocation, but I guess we don't really want to depend on that...
> >
> > Any comment?
>

-- 
[image: Linaro] <http://www.linaro.org/>
François-Frédéric Ozog | *Director Linaro Networking Group*
T: +33.67221.6485
francois.o...@linaro.org | Skype: ffozog

Re: [lng-odp] continuous memory allocation for drivers

Reply via email to