On 08-Mar-18 8:11 PM, Burakov, Anatoly wrote:
On 08-Mar-18 2:36 PM, Burakov, Anatoly wrote:
On 08-Mar-18 1:36 PM, Pavan Nikhilesh wrote:
Hi Anatoly,

We are currently facing issues with running testpmd on thunderx platform.
The issue seems to be with vfio

EAL: Detected 24 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: No free hugepages reported in hugepages-2048kB
EAL: Multi-process socket /var/run/.rte_unix
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL:   VFIO support not initialized

<snip>

EAL:   probe driver: 177d:a053 octeontx_fpavf
EAL: PCI device 0001:01:00.1 on NUMA socket 0
EAL:   probe driver: 177d:a034 net_thunderx
EAL:   using IOMMU type 1 (Type 1)
EAL:   cannot set up DMA remapping, error 22 (Invalid argument)
EAL:   0001:01:00.1 DMA remapping failed, error 22 (Invalid argument)
EAL: Requested device 0001:01:00.1 cannot be used
EAL: PCI device 0001:01:00.2 on NUMA socket 0
<snip>
testpmd: No probed ethernet devices
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
EAL:   VFIO support not initialized
EAL:   VFIO support not initialized
EAL:   VFIO support not initialized
Done


This is because rte_service_init() calls rte_calloc() before
rte_bus_probe() and vfio_dma_mem_map fails because iommu type is not set.

Call stack:
gdb) bt
#0  vfio_dma_mem_map (vaddr=281439006359552, iova=11274289152, len=536870912, do_map=1) at /root/clean/dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c:967 #1  0x00000000004fd974 in rte_vfio_dma_map (vaddr=281439006359552, iova=11274289152, len=536870912) at /root/clean/dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c:988 #2  0x00000000004fbe78 in vfio_mem_event_callback (type=RTE_MEM_EVENT_ALLOC, addr=0xfff7a0000000, len=536870912) at /root/clean/dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c:240 #3  0x00000000005070ac in eal_memalloc_notify (event=RTE_MEM_EVENT_ALLOC, start=0xfff7a0000000, len=536870912) at /root/clean/dpdk/lib/librte_eal/common/eal_common_memalloc.c:177 #4  0x0000000000515c98 in try_expand_heap_primary (heap=0xffffb7fb167c, pg_sz=536870912, elt_size=8192, socket=0, flags=0, align=128, bound=0, contig=false) at /root/clean/dpdk/lib/librte_eal/common/malloc_heap.c:247 #5  0x0000000000515e94 in try_expand_heap (heap=0xffffb7fb167c, pg_sz=536870912, elt_size=8192, socket=0, flags=0, align=128, bound=0, contig=false) at /root/clean/dpdk/lib/librte_eal/common/malloc_heap.c:327 #6  0x00000000005163a0 in alloc_more_mem_on_socket (heap=0xffffb7fb167c, size=8192, socket=0, flags=0, align=128, bound=0, contig=false) at /root/clean/dpdk/lib/librte_eal/common/malloc_heap.c:455 #7  0x0000000000516514 in heap_alloc_on_socket (type=0x85bf90 "rte_services", size=8192, socket=0, flags=0, align=128, bound=0, contig=false) at /root/clean/dpdk/lib/librte_eal/common/malloc_heap.c:491 #8  0x0000000000516664 in malloc_heap_alloc (type=0x85bf90 "rte_services", size=8192, socket_arg=-1, flags=0, align=128, bound=0, contig=false) at /root/clean/dpdk/lib/librte_eal/common/malloc_heap.c:527 #9  0x0000000000513b54 in rte_malloc_socket (type=0x85bf90 "rte_services", size=8192, align=128, socket_arg=-1) at /root/clean/dpdk/lib/librte_eal/common/rte_malloc.c:54 #10 0x0000000000513bc8 in rte_zmalloc_socket (type=0x85bf90 "rte_services", size=8192, align=128, socket=-1) at /root/clean/dpdk/lib/librte_eal/common/rte_malloc.c:72 #11 0x0000000000513c00 in rte_zmalloc (type=0x85bf90 "rte_services", size=8192, align=128) at /root/clean/dpdk/lib/librte_eal/common/rte_malloc.c:81 #12 0x0000000000513c90 in rte_calloc (type=0x85bf90 "rte_services", num=64, size=128, align=128) at /root/clean/dpdk/lib/librte_eal/common/rte_malloc.c:99 #13 0x0000000000518cec in rte_service_init () at /root/clean/dpdk/lib/librte_eal/common/rte_service.c:81 #14 0x00000000004f55f4 in rte_eal_init (argc=3, argv=0xfffffffff488) at /root/clean/dpdk/lib/librte_eal/linuxapp/eal/eal.c:959 #15 0x000000000045af5c in main (argc=3, argv=0xfffffffff488) at /root/clean/dpdk/app/test-pmd/testpmd.c:2483


Also, I have tried running with --legacy-mem but I'm stuck in
`pci_find_max_end_va` loop  because `rte_fbarray_find_next_used` always return
0. >
HugePages_Total:      15
HugePages_Free:       11
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:     524288 kB

Call Stack:
(gdb) bt
#0  find_next (arr=0xffffb7fb009c, start=0, used=true) at /root/clean/dpdk/lib/librte_eal/common/eal_common_fbarray.c:248 #1  0x00000000005132a8 in rte_fbarray_find_next_used (arr=0xffffb7fb009c, start=0) at /root/clean/dpdk/lib/librte_eal/common/eal_common_fbarray.c:700 #2  0x000000000052d030 in pci_find_max_end_va () at /root/clean/dpdk/drivers/bus/pci/linux/pci.c:138 #3  0x0000000000530ab8 in pci_vfio_map_resource_primary (dev=0xeae700) at /root/clean/dpdk/drivers/bus/pci/linux/pci_vfio.c:499 #4  0x0000000000530ffc in pci_vfio_map_resource (dev=0xeae700) at /root/clean/dpdk/drivers/bus/pci/linux/pci_vfio.c:601 #5  0x000000000052ce90 in rte_pci_map_device (dev=0xeae700) at /root/clean/dpdk/drivers/bus/pci/linux/pci.c:75 #6  0x0000000000531a20 in rte_pci_probe_one_driver (dr=0x997e20 <rte_nicvf_pmd>, dev=0xeae700) at /root/clean/dpdk/drivers/bus/pci/pci_common.c:164 #7  0x0000000000531c68 in pci_probe_all_drivers (dev=0xeae700) at /root/clean/dpdk/drivers/bus/pci/pci_common.c:249 #8  0x0000000000531f68 in rte_pci_probe () at /root/clean/dpdk/drivers/bus/pci/pci_common.c:359 #9  0x000000000050a140 in rte_bus_probe () at /root/clean/dpdk/lib/librte_eal/common/eal_common_bus.c:98 #10 0x00000000004f55f4 in rte_eal_init (argc=1, argv=0xfffffffff498) at /root/clean/dpdk/lib/librte_eal/linuxapp/eal/eal.c:967 #11 0x000000000045af5c in main (argc=1, argv=0xfffffffff498) at /root/clean/dpdk/app/test-pmd/testpmd.c:2483

Am I missing something here?

I'll look into those, thanks!

Btw, i've now set up a github repo with the patchset applied:

https://github.com/anatolyburakov/dpdk

I will be pushing quick fixes there before spinning new revisions, so we can discover and fix bugs more rapidly. I'll fix compile issues reported earlier, then i'll take a look at your issues. The latter one seems like a typo, the former is probably a matter of moving things around a bit.

(also, pull requests welcome if you find it easier to fix things yourself and submit patches against my tree!)

Thanks for testing.


I've looked into the failures.

The VFIO one is not actually a failure. It only prints out errors because rte_malloc is called before VFIO is initialized. However, once VFIO *is* initialized, all of that memory would be added to VFIO, so these error messages are harmless. Regardless, i've added a check to see if init is finished before printing out those errors, so they won't be printed out any more.

Second one is a typo on my part that got lost in one of the rebases.

I've pushed fixes for both into the github repo.


Although i do wonder where do the DMA remapping errors come from. The error message says "invalid argument", so that doesn't come from rte_service or anything to do with rte_malloc - this is us not providing valid arguments to VFIO. I'm not seeing these errors on my system. I'll check on others to be sure.

--
Thanks,
Anatoly

Reply via email to