On 21/04/17 18:35, Jonas Pfefferle1 wrote:
> ----------------------------------------
> Jonas Pfefferle
> Cloud Storage & Analytics
> IBM Zurich Research Laboratory
> Saeumerstrasse 4
> CH-8803 Rueschlikon, Switzerland
> +41 44 724 8539
> 
> Alexey Kardashevskiy <a...@ozlabs.ru> wrote on 21/04/2017 05:42:35:
> 
>> From: Alexey Kardashevskiy <a...@ozlabs.ru>
>> To: gowrishankar muthukrishnan <gowrishanka...@linux.vnet.ibm.com>
>> Cc: Jonas Pfefferle1 <j...@zurich.ibm.com>, Gowrishankar
>> Muthukrishnan <gowrishanka...@in.ibm.com>, Adrian Schuepbach
>> <d...@zurich.ibm.com>, "dev@dpdk.org" <dev@dpdk.org>
>> Date: 21/04/2017 05:42
>> Subject: Re: [dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use
>> correct bus addresses for DMA map
>>
>> On 21/04/17 05:16, gowrishankar muthukrishnan wrote:
>> > On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote:
>> >> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
>> >>> On 20/04/17 19:04, Jonas Pfefferle1 wrote:
>> >>>> Alexey Kardashevskiy <a...@ozlabs.ru> wrote on 20/04/2017 09:24:02:
>> >>>>
>> >>>>> From: Alexey Kardashevskiy <a...@ozlabs.ru>
>> >>>>> To: dev@dpdk.org
>> >>>>> Cc: Alexey Kardashevskiy <a...@ozlabs.ru>, j...@zurich.ibm.com,
>> >>>>> Gowrishankar Muthukrishnan <gowrishanka...@in.ibm.com>
>> >>>>> Date: 20/04/2017 09:24
>> >>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>> >>>>> addresses for DMA map
>> >>>>>
>> >>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>> >>>>> just created DMA window. It happens to start from zero because the
>> >>>>> default
>> >>>>> window is removed (leaving no windows) and new window starts from zero.
>> >>>>> However this is not guaranteed and the new window may start from
> another
>> >>>>> address, this adds an error check.
>> >>>>>
>> >>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
>> >>>>> bus address while in this case a physical address of a user
>> page is used.
>> >>>>> This changes IOVA to start from zero in a hope that the rest of DPDK
>> >>>>> expects this.
>> >>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It
>> will use the
>> >>>> phys_addr of the memory segment it got from /proc/self/pagemap cf.
>> >>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here
> to the
>> >>>> actual iova which basically makes the whole virtual to phyiscal mapping
>> >>>> with pagemap unnecessary which I believe should be the case for VFIO
>> >>>> anyway. Pagemap should only be needed when using pci_uio.
>> >>>
>> >>> Ah, ok, makes sense now. But it sure needs a big fat comment
>> there as it is
>> >>> not obvious why host RAM address is used there as DMA window start is not
>> >>> guaranteed.
>> >> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64
> both
>> >> have exact same value, in my setup it is 3fffb33c0000 which is a userspace
>> >> address - at least ms[i].phys_addr must be physical address.
>> >
>> > This patch breaks i40e_dev_init() in my server.
>> >
>> > EAL: PCI device 0004:01:00.0 on NUMA socket 1
>> > EAL:   probe driver: 8086:1583 net_i40e
>> > EAL:   using IOMMU type 7 (sPAPR)
>> > eth_i40e_dev_init(): Failed to init adminq: -32
>> > EAL: Releasing pci mapped resource for 0004:01:00.0
>> > EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000
>> > EAL: Requested device 0004:01:00.0 cannot be used
>> > EAL: PCI device 0004:01:00.1 on NUMA socket 1
>> > EAL:   probe driver: 8086:1583 net_i40e
>> > EAL:   using IOMMU type 7 (sPAPR)
>> > eth_i40e_dev_init(): Failed to init adminq: -32
>> > EAL: Releasing pci mapped resource for 0004:01:00.1
>> > EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000
>> > EAL: Requested device 0004:01:00.1 cannot be used
>> > EAL: No probed ethernet devices
>> >
>> > I have two memseg each of 1G size. Their mapped PA and VA are
> alsodifferent.
>> >
>> > (gdb) p /x ms[0]
>> > $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 =
>> > 0x3effaf000000},
>> >   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel =
>> > 0x0, nrank = 0x0}
>> > (gdb) p /x ms[1]
>> > $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 =
>> > 0x3efbaf000000},
>> >   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel =
>> > 0x0, nrank = 0x0}
>> >
>> > Could you please recheck this. May be, if new DMA window does not start
>> > from bus address 0,
>> > only then you reset dma_map.iova for this offset ?
>>
>> As we figured out, it is --no-huge effect.
>>
>> Another thing - as I read the code - the window size comes from
>> rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB
>> window so it is far away from 1:1 mapping which is believed to be DPDK
>> expectation. Looking now for a better version of
> rte_eal_get_physmem_size()...
> 
> You can try specifying the size with -m or --socket-mem.


Oh, right. Thanks.


>>
>>
>> And another problem - after few unsuccessful starts of app/testpmd, all
>> huge pages are gone:
>>
>> aik@stratton2:~$ cat /proc/meminfo
>> MemTotal:       535527296 kB
>> MemFree:        516662272 kB
>> MemAvailable:   515501696 kB
>> ...
>> HugePages_Total:    1024
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:      16384 kB
>>
>>
>> How is that possible? What is pinning these pages so testpmd process exit
>> does not clear that up?
> 
> I've also seen this. I think that happens if it does not cleanly shutdown.
> I regularly clean /dev/hugepages ...


Oh, I am learning new things about hugepages as we speak :) I think not
being anonymous mapping has this effect. Anyway, this is a bug - pages stay
allocated after every run of testpmd, even if it does not crash but just
does exit() :-/

I still cannot get it working, with Intel 40G ethernet now, this is how far
I get:

USER1: create a new mbuf pool <mbuf_pool_socket_1>: n=1419456, size=2176,
socket=1
EAL: Error - exiting with code: 1
  Cause: Creation of mbuf pool for socket 1 failed: Cannot allocate memory
aik@stratton2:~$


I have put more details to another email.


> 
>>
>>
>>
>>
>> >
>> >
>> > Thanks,
>> > Gowrishankar
>> >
>> >>
>> >>>
>> >>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
>> >>>>> ---
>> >>>>>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>> >>>>>   1 file changed, 10 insertions(+), 2 deletions(-)
>> >>>>>
>> >>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>> >>>>> librte_eal/linuxapp/eal/eal_vfio.c
>> >>>>> index 46f951f4d..8b8e75c4f 100644
>> >>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> >>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>> >>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>   {
>> >>>>>      const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>> >>>>>      int i, ret;
>> >>>>> -
>> >>>>> +   phys_addr_t io_offset;
>> >>>>>      struct vfio_iommu_spapr_register_memory reg = {
>> >>>>>         .argsz = sizeof(reg),
>> >>>>>         .flags = 0
>> >>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>         return -1;
>> >>>>>      }
>> >>>>>   +   io_offset = create.start_addr;
>> >>>>> +   if (io_offset) {
>> >>>>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not
>> >>>>> supported, "
>> >>>>> +            "new window is created at %lx\n", io_offset);
>> >>>>> +      return -1;
>> >>>>> +   }
>> >>>>> +
>> >>>>>      /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>> >>>>>      for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>> >>>>>         struct vfio_iommu_type1_dma_map dma_map;
>> >>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>         dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>> >>>>>         dma_map.vaddr = ms[i].addr_64;
>> >>>>>         dma_map.size = ms[i].len;
>> >>>>> -      dma_map.iova = ms[i].phys_addr;
>> >>>>> +      dma_map.iova = io_offset;
>> >>>>>         dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>> >>>>>                VFIO_DMA_MAP_FLAG_WRITE;
>> >>>>>   @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>> >>>>>            return -1;
>> >>>>>         }
>> >>>>>   +      io_offset += dma_map.size;
>> >>>>>      }
>> >>>>>        return 0;
>> >>>>> --
>> >>>>> 2.11.0
>> >>>>>
>> >>>
>> >>
>> >
>> >
>>
>>
>> --
>> Alexey
>>
> 


-- 
Alexey

Reply via email to