Re: [lng-odp] Bug 3657

gyanesh patra Thu, 12 Apr 2018 05:50:10 -0700

The current odp-dpdk code is ot working. It gives the same error.
But odp-dpdk was working before (august-september 2017). But the recent
updates changed the behaviour.


P Gyanesh Kumar Patra

On Thu, Apr 12, 2018 at 3:28 AM, Elo, Matias (Nokia - FI/Espoo) <
matias....@nokia.com> wrote:

> Hi,
>
> Have you tested the latest odp-dpdk code? It uses different shm
> implementation, so at least we could rule that one out.
>
> -Matias
>
>
> > On 10 Apr 2018, at 21:37, gyanesh patra <pgyanesh.pa...@gmail.com>
> wrote:
> >
> > Hi Matias,
> >
> > The Mellanox interfaces are mapped to Numa Node 1. (device id: 81:00.x)
> > We have free hugepages on both Node0 and Node1 as identified below.
> >
> >   root# cat /sys/devices/system/node/node0/hugepages/hugepages-
> 1048576kB/free_hugepages
> >    77
> >   root# cat /sys/devices/system/node/node1/hugepages/hugepages-
> 1048576kB/free_hugepages
> >    83
> >
> > The ODP application is using CPU/lcore associated with numa Node1 too.
> > I have tried with the dpdk-17.11.1 version too without success.
> > The issue may be somewhere else.
> >
> > Regarding the usage of 2M pages  (1024 x 2M pages):
> >  - I unmounted the 1G hugepages and then set 1024x2M pages using
> dpdk-setup.sh scripts.
> >  - But with this setup failed with the same error as before.
> >
> > Let me know if there is any other option we can try.
> >
> > Thanks,
> > P Gyanesh Kumar Patra
> >
> > On Thu, Mar 29, 2018 at 4:46 AM, Elo, Matias (Nokia - FI/Espoo) <
> matias....@nokia.com> wrote:
> > A second thing to try. Since you seem to have a NUMA  system, the ODP
> application should be run on the same NUMA socket as the NIC (e.g. using
> taskset if necessary). In case of different sockets, both sockets should
> have huge pages mapped.
> >
> > -Matias
> >
> > > On 29 Mar 2018, at 10:00, Elo, Matias (Nokia - FI/Espoo) <
> matias....@nokia.com> wrote:
> > >
> > > Hi Gyanesh,
> > >
> > > It seems you are using 1G huge pages. Have you tried using 2M pages
> (1024 x 2M pages should be enough)? As Bill noted, this seems like a memory
> related issue.
> > >
> > > -Matias
> > >
> > >
> > >> On 28 Mar 2018, at 18:15, gyanesh patra <pgyanesh.pa...@gmail.com>
> wrote:
> > >>
> > >> Yes, it is.
> > >> The error is the same. I did replied that the only difference I see
> is with Ubuntu version and different minor version of mellanox driver.
> > >>
> > >> On Wed, Mar 28, 2018, 07:29 Bill Fischofer <bill.fischo...@linaro.org>
> wrote:
> > >> Thanks for the update. Sounds like you're already using DPDK 17.11?
> > >> What about Mellanox driver level? Is the failure the same as you
> > >> originally reported?
> > >>
> > >> From the reported error:
> > >>
> > >> pktio/dpdk.c:1538:dpdk_start():Queue setup failed: err=-12, port=0
> > >> odp_l2fwd.c:1671:main():Error: unable to start 0
> > >>
> > >> This is a DPDK PMD driver error reported by rte_eth_rx_queue_setup().
> > >> In the Mellanox PMD (drivers/net/mlx5/mlx5_rxq.c) this is the
> > >> mlx5_rx_queue_setup() routine. The relevant code seems to be this:
> > >>
> > >> if (rxq != NULL) {
> > >>        DEBUG("%p: reusing already allocated queue index %u (%p)",
> > >>                      (void *)dev, idx, (void *)rxq);
> > >>        if (priv->started) {
> > >>                priv_unlock(priv);
> > >>                return -EEXIST;
> > >>        }
> > >>        (*priv->rxqs)[idx] = NULL;
> > >>        rxq_cleanup(rxq_ctrl);
> > >>        /* Resize if rxq size is changed. */
> > >>        if (rxq_ctrl->rxq.elts_n != log2above(desc)) {
> > >>                rxq_ctrl = rte_realloc(rxq_ctrl,
> > >>                                                  sizeof(*rxq_ctrl) +
> > >>                                                  (desc + desc_pad) *
> > >>                                                  sizeof(struct
> rte_mbuf *),
> > >>                                                  RTE_CACHE_LINE_SIZE);
> > >>                if (!rxq_ctrl) {
> > >>                        ERROR("%p: unable to reallocate queue index
> %u",
> > >>                                      (void *)dev, idx);
> > >>                                      priv_unlock(priv);
> > >>                                      return -ENOMEM;
> > >>               }
> > >>        }
> > >> } else {
> > >>        rxq_ctrl = rte_calloc_socket("RXQ", 1, sizeof(*rxq_ctrl) +
> > >>                                                    (desc + desc_pad) *
> > >>                                                     sizeof(struct
> rte_mbuf *),
> > >>                                                     0, socket);
> > >>        if (rxq_ctrl == NULL) {
> > >>                 ERROR("%p: unable to allocate queue index %u",
> > >>                               (void *)dev, idx);
> > >>                               priv_unlock(priv);
> > >>                return -ENOMEM;
> > >>        }
> > >> }
> > >>
> > >> The reported -12 error code is -ENOMEM so I'd say the issue is some
> > >> sort of memory allocation failure.
> > >>
> > >>
> > >> On Wed, Mar 28, 2018 at 8:43 AM, gyanesh patra <
> pgyanesh.pa...@gmail.com> wrote:
> > >>> Hi Bill,
> > >>> I tried with Matias' suggestions but without success.
> > >>>
> > >>> P Gyanesh Kumar Patra
> > >>>
> > >>> On Mon, Mar 26, 2018 at 4:16 PM, Bill Fischofer <
> bill.fischo...@linaro.org>
> > >>> wrote:
> > >>>>
> > >>>> Hi Gyanesh,
> > >>>>
> > >>>> Have you had a chance to look at
> > >>>> https://bugs.linaro.org/show_bug.cgi?id=3657 and see if Matias'
> suggestions
> > >>>> are helpful to you?
> > >>>>
> > >>>> Thanks,
> > >>>>
> > >>>> Regards,
> > >>>> Bill
> > >>>
> > >>>
> > >
> >
> >
>
>

Re: [lng-odp] Bug 3657

Reply via email to