RE: [EXT] [PATCH v8] app/dma-perf: introduce dma-perf application

Anoob Joseph Sun, 25 Jun 2023 22:41:53 -0700

Hi Cheng,

Please see inline.


Thanks,
Anoob

> -----Original Message-----
> From: Jiang, Cheng1 <cheng1.ji...@intel.com>
> Sent: Saturday, June 24, 2023 5:23 PM
> To: Anoob Joseph <ano...@marvell.com>; tho...@monjalon.net;
> Richardson, Bruce <bruce.richard...@intel.com>;
> m...@smartsharesystems.com; Xia, Chenbo <chenbo....@intel.com>; Amit
> Prakash Shukla <amitpraka...@marvell.com>; huangdeng...@huawei.com;
> Laatz, Kevin <kevin.la...@intel.com>; fengcheng...@huawei.com; Jerin
> Jacob Kollanukkaran <jer...@marvell.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu...@intel.com>; Ding, Xuan
> <xuan.d...@intel.com>; Ma, WenwuX <wenwux...@intel.com>; Wang,
> YuanX <yuanx.w...@intel.com>; He, Xingguang <xingguang...@intel.com>;
> Ling, WeiX <weix.l...@intel.com>
> Subject: RE: [EXT] [PATCH v8] app/dma-perf: introduce dma-perf application
> 
> Hi Anoob,
> 
> Replies are inline.
> 
> Thanks,
> Cheng
> 
> > -----Original Message-----
> > From: Anoob Joseph <ano...@marvell.com>
> > Sent: Friday, June 23, 2023 2:53 PM
> > To: Jiang, Cheng1 <cheng1.ji...@intel.com>; tho...@monjalon.net;
> > Richardson, Bruce <bruce.richard...@intel.com>;
> > m...@smartsharesystems.com; Xia, Chenbo <chenbo....@intel.com>; Amit
> > Prakash Shukla <amitpraka...@marvell.com>;
> huangdeng...@huawei.com;
> > Laatz, Kevin <kevin.la...@intel.com>; fengcheng...@huawei.com; Jerin
> > Jacob Kollanukkaran <jer...@marvell.com>
> > Cc: dev@dpdk.org; Hu, Jiayu <jiayu...@intel.com>; Ding, Xuan
> > <xuan.d...@intel.com>; Ma, WenwuX <wenwux...@intel.com>; Wang,
> YuanX
> > <yuanx.w...@intel.com>; He, Xingguang <xingguang...@intel.com>;
> Ling,
> > WeiX <weix.l...@intel.com>
> > Subject: RE: [EXT] [PATCH v8] app/dma-perf: introduce dma-perf
> > application
> >
> > Hi Cheng,
> >
> > Thanks for the new version. Please see inline.
> >
> > Thanks,
> > Anoob
> >
> > > -----Original Message-----
> > > From: Cheng Jiang <cheng1.ji...@intel.com>
> > > Sent: Tuesday, June 20, 2023 12:24 PM
> > > To: tho...@monjalon.net; bruce.richard...@intel.com;
> > > m...@smartsharesystems.com; chenbo....@intel.com; Amit Prakash
> Shukla
> > > <amitpraka...@marvell.com>; Anoob Joseph <ano...@marvell.com>;
> > > huangdeng...@huawei.com; kevin.la...@intel.com;
> > > fengcheng...@huawei.com; Jerin Jacob Kollanukkaran
> > > <jer...@marvell.com>
> > > Cc: dev@dpdk.org; jiayu...@intel.com; xuan.d...@intel.com;
> > > wenwux...@intel.com; yuanx.w...@intel.com;
> xingguang...@intel.com;
> > > weix.l...@intel.com; Cheng Jiang <cheng1.ji...@intel.com>
> > > Subject: [EXT] [PATCH v8] app/dma-perf: introduce dma-perf
> > > application
> > >
> > > External Email
> > >
> > > --------------------------------------------------------------------
> > > -- There are many high-performance DMA devices supported in DPDK
> > > now,
> > and
> > > these DMA devices can also be integrated into other modules of DPDK
> > > as accelerators, such as Vhost. Before integrating DMA into
> > > applications, developers need to know the performance of these DMA
> > > devices in various scenarios and the performance of CPUs in the same
> > > scenario, such as different buffer lengths. Only in this way can we
> > > know the target performance of the application accelerated by using
> > > them. This patch introduces a high-performance testing tool, which
> > > supports comparing the performance of CPU and DMA in different
> > > scenarios automatically with a pre- set config file. Memory Copy
> > > performance test
> > are supported for now.
> > >
> > > Signed-off-by: Cheng Jiang <cheng1.ji...@intel.com>
> > > Signed-off-by: Jiayu Hu <jiayu...@intel.com>
> > > Signed-off-by: Yuan Wang <yuanx.w...@intel.com>
> > > Acked-by: Morten Brørup <m...@smartsharesystems.com>
> > > Acked-by: Chenbo Xia <chenbo....@intel.com>
> > > ---
> > > v8:
> > >   fixed string copy issue in parse_lcore();
> > >   improved some data display format;
> > >   added doc in doc/guides/tools;
> > >   updated release notes;
> > >
> > > v7:
> > >   fixed some strcpy issues;
> > >   removed cache setup in calling rte_pktmbuf_pool_create();
> > >   fixed some typos;
> > >   added some memory free and null set operations;
> > >   improved result calculation;
> > > v6:
> > >   improved code based on Anoob's comments;
> > >   fixed some code structure issues;
> > > v5:
> > >   fixed some LONG_LINE warnings;
> > > v4:
> > >   fixed inaccuracy of the memory footprint display;
> > > v3:
> > >   fixed some typos;
> > > v2:
> > >   added lcore/dmadev designation;
> > >   added error case process;
> > >   removed worker_threads parameter from config.ini;
> > >   improved the logs;
> > >   improved config file;
> > >
> > >  app/meson.build                        |   1 +
> > >  app/test-dma-perf/benchmark.c          | 498 +++++++++++++++++++++
> > >  app/test-dma-perf/config.ini           |  61 +++
> > >  app/test-dma-perf/main.c               | 594 +++++++++++++++++++++++++
> > >  app/test-dma-perf/main.h               |  69 +++
> > >  app/test-dma-perf/meson.build          |  17 +
> > >  doc/guides/rel_notes/release_23_07.rst |   6 +
> > >  doc/guides/tools/dmaperf.rst           | 103 +++++
> > >  doc/guides/tools/index.rst             |   1 +
> > >  9 files changed, 1350 insertions(+)  create mode 100644
> > > app/test-dma-perf/benchmark.c  create mode 100644
> > > app/test-dma-perf/config.ini  create mode 100644 app/test-dma-
> > > perf/main.c  create mode 100644 app/test-dma-perf/main.h  create
> > > mode
> > > 100644 app/test-dma-perf/meson.build  create mode 100644
> > > doc/guides/tools/dmaperf.rst
> > >
> >

<snip>
 
> >
> > > +                 fprintf(stderr, "Error: Fail to find DMA %s.\n",
> > > dma_name);
> > > +                 goto end;
> > > +         }
> > > +
> > > +         ldm->dma_ids[i] = dev_id;
> > > +         configure_dmadev_queue(dev_id, ring_size);
> > > +         ++nb_dmadevs;
> > > + }
> > > +
> > > +end:
> > > + if (nb_dmadevs < nb_workers) {
> > > +         printf("Not enough dmadevs (%u) for all workers (%u).\n",
> > > nb_dmadevs, nb_workers);
> > > +         return -1;
> > > + }
> > > +
> > > + printf("Number of used dmadevs: %u.\n", nb_dmadevs);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static inline void
> > > +do_dma_submit_and_poll(uint16_t dev_id, uint64_t *async_cnt,
> > > +                 volatile struct worker_info *worker_info) {
> > > + int ret;
> > > + uint16_t nr_cpl;
> > > +
> > > + ret = rte_dma_submit(dev_id, 0);
> > > + if (ret < 0) {
> > > +         rte_dma_stop(dev_id);
> > > +         rte_dma_close(dev_id);
> > > +         rte_exit(EXIT_FAILURE, "Error with dma submit.\n");
> > > + }
> > > +
> > > + nr_cpl = rte_dma_completed(dev_id, 0, MAX_DMA_CPL_NB, NULL,
> > > NULL);
> > > + *async_cnt -= nr_cpl;
> > > + worker_info->total_cpl += nr_cpl;
> > > +}
> > > +
> > > +static inline int
> > > +do_dma_mem_copy(void *p)
> >
> > [Anoob] Just curious, why not pass struct lcore_params *para itself?
> > Is it because the pointer is volatile? If yes, then we can take an AI
> > to split the struct into volatile and non-volatile parts.
> 
> [Cheng] The reason I did it this way is because I want to launch this function
> on another core by spawning a new thread, and rte_eal_remote_launch()
> takes a void * as the parameter. That's why I passed void *p.  Your
> suggestion to split the struct into volatile and non-volatile parts is quite
> reasonable. I am thinking about the best way to implement it. Thanks.

[Anoob] Instead of passing the address of index variable as void *, you can 
easily send lcore_params pointer, right?

> 
> >
> > > +{
> > > + const uint16_t *para_idx = (uint16_t *)p;
> > > + volatile struct lcore_params *para = lcores_p[*para_idx].v_ptr;
> > > + volatile struct worker_info *worker_info = &(para->worker_info);
> > > + const uint16_t dev_id = para->dev_id;
> > > + const uint32_t nr_buf = para->nr_buf;
> > > + const uint16_t kick_batch = para->kick_batch;
> > > + const uint32_t buf_size = para->buf_size;
> > > + struct rte_mbuf **srcs = para->srcs;
> > > + struct rte_mbuf **dsts = para->dsts;
> > > + uint16_t nr_cpl;
> > > + uint64_t async_cnt = 0;
> > > + uint32_t i;
> > > + uint32_t poll_cnt = 0;
> > > + int ret;
> > > +
> > > + worker_info->stop_flag = false;
> > > + worker_info->ready_flag = true;
> > > +
> > > + while (!worker_info->start_flag)
> > > +         ;
> > > +
> > > + while (1) {
> > > +         for (i = 0; i < nr_buf; i++) {
> > > +dma_copy:
> > > +                 ret = rte_dma_copy(dev_id, 0,
> > > rte_pktmbuf_iova(srcs[i]),
> > > +                         rte_pktmbuf_iova(dsts[i]), buf_size, 0);
> >
> > [Anoob] Do we need to use ' rte_mbuf_data_iova' here instead of
> > 'rte_pktmbuf_iova'?
> 
> [Cheng] yes rte_mbuf_data_iova is more appropriate, I'll fix it in the next
> version. Thanks.
> 
> >
> > > +                 if (unlikely(ret < 0)) {
> > > +                         if (ret == -ENOSPC) {
> > > +                                 do_dma_submit_and_poll(dev_id,
> > > &async_cnt, worker_info);
> > > +                                 goto dma_copy;
> > > +                         } else {
> > > +                                 /* Error exit */
> > > +                                 rte_dma_stop(dev_id);
> >
> > [Anoob] Missing rte_dma_close() here. Also, may be introduce a static
> > void function so that rte_exit call etc won't be part of the fastpath loop.
> >
> > May be something like below and you can call it from here and
> > "do_dma_submit_and_poll".
> >
> > static void
> > error_exit(int dev_id)
> > {
> >     /* Error exit */
> >     rte_dma_stop(dev_id);
> >     rte_dma_close(dev_id);
> >     rte_exit(EXIT_FAILURE, "DMA enqueue failed\n"); }
> >
> 
> [Cheng] I'm not so sure here. rte_dma_close() is called in the rte_exit(). Do
> we still call it explicitly before rte_exit()?

[Anoob] In ' do_dma_submit_and_poll', there is rte_dma_close() before 
rte_exit(). I'm fine either way as long is it is consistent. Said that, I think 
it is better to call close() from app, rather than relying on rte_exit.

<snip>

RE: [EXT] [PATCH v8] app/dma-perf: introduce dma-perf application

Reply via email to