Hi, Konstantin

> -----邮件原件-----
> 发件人: Ananyev, Konstantin <konstantin.anan...@intel.com>
> 发送时间: 2021年1月22日 21:16
> 收件人: Feifei Wang <feifei.wa...@arm.com>; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; Olivier Matz <olivier.m...@6wind.com>;
> Gavin Hu <gavin...@arm.com>
> 抄送: dev@dpdk.org; nd <n...@arm.com>; sta...@dpdk.org
> 主题: RE: [PATCH v1 1/3] test/ring: reduce iteration numbers to make test
> duration shorter
> 
> 
> > When testing ring performance in the case that multiple lcores are
> > mapped to the same physical core, e.g. --lcores '(0-3)@10', it takes a
> > very long time to wait for the "enqueue_dequeue_bulk_helper" to
> > finish. This is because too much iteration numbers and extremely low
> > efficiency for enqueue and dequeue with this kind of core mapping.
> > Following are the test results to show the above phenomenon:
> >
> > x86-Intel(R) Xeon(R) Gold 6240:
> > $sudo ./app/test/dpdk-test --lcores '(0-1)@25'
> > Testing using two hyperthreads(bulk (size: 8):)
> > iter_shift:         3     5     7     9     11     13    *15     17     19  
> >    21      23
> > run time:           7s    7s    7s    8s    9s     16s    47s    170s   
> > 660s   >0.5h   >1h
> > legacy APIs: SP/SC: 37    11    6     40525 40525  40209  40367  40407  
> > 40541
> NoData  NoData
> > legacy APIs: MP/MC: 56    14    11    50657 40526  40526  40526  40625  
> > 40585
> NoData  NoData
> >
> > aarch64-n1sdp:
> > $sudo ./app/test/dpdk-test --lcore '(0-1)@1'
> > Testing using two hyperthreads(bulk (size: 8):)
> > iter_shift:         3     5     7     9     11     13    *15     17     19  
> >    21      23
> > run time:           8s    8s    8s    9s    9s     14s    34s    111s   
> > 418s   25min   >1h
> > legacy APIs: SP/SC: 0.4   0.2   0.1   488   488    488    488    488    489 
> >    489
> NoData
> > legacy APIs: MP/MC: 0.4   0.3   0.2   488   488    488    488    490    489 
> >    489
> NoData
> >
> > As the number of iterations increases, so does the time which is
> > required to run the program. Currently (iter_shift = 23), it will take
> > more than 1 hour to wait for the test to finish. To fix this, the
> > "iter_shift" should decrease and ensure enough iterations to keep the
> > test data stable. In order to achieve this, we also test with "-l" EAL
> argument:
> >
> > x86-Intel(R) Xeon(R) Gold 6240:
> > $sudo ./app/test/dpdk-test -l 25-26
> > Testing using two NUMA nodes(bulk (size: 8):)
> > iter_shift:         3     5     7     9     11     13    *15     17     19  
> >    21      23
> > run time:           6s    6s    6s    6s    6s     6s     6s     7s     8s  
> >    11s     27s
> > legacy APIs: SP/SC: 47    20    13    22    54     83     91     73     81  
> >    75      95
> > legacy APIs: MP/MC: 44    18    18    240   245    270    250    249    252 
> >    250
> 253
> >
> > aarch64-n1sdp:
> > $sudo ./app/test/dpdk-test -l 1-2
> > Testing using two physical cores(bulk (size: 8):)
> > iter_shift:         3     5     7     9     11     13    *15     17     19  
> >    21      23
> > run time:           8s    8s    8s    8s    8s     8s     8s     9s     9s  
> >    11s     23s
> > legacy APIs: SP/SC: 0.7   0.4   1.2   1.8   2.0    2.0    2.0    2.0    2.0 
> >    2.0     2.0
> > legacy APIs: MP/MC: 0.3   0.4   1.3   1.9   2.9    2.9    2.9    2.9    2.9 
> >    2.9     2.9
> >
> > According to above test data, when "iter_shift" is set as "15", the
> > test run time is reduced to less than 1 minute and the test result can
> > keep stable in x86 and aarch64 servers.
> >
> > Fixes: 1fa5d0099efc ("test/ring: add custom element size performance
> > tests")
> > Cc: honnappa.nagaraha...@arm.com
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Feifei Wang <feifei.wa...@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
> > ---
> >  app/test/test_ring_perf.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> > index e63e25a86..fd82e2041 100644
> > --- a/app/test/test_ring_perf.c
> > +++ b/app/test/test_ring_perf.c
> > @@ -178,7 +178,7 @@ enqueue_dequeue_bulk_helper(const unsigned int
> flag, const int esize,
> >     struct thread_params *p)
> >  {
> >     int ret;
> > -   const unsigned int iter_shift = 23;
> > +   const unsigned int iter_shift = 15;
> >     const unsigned int iterations = 1 << iter_shift;
> >     struct rte_ring *r = p->r;
> >     unsigned int bsize = p->size;
> > --
> 
> I think it would be better to rework the test(s) to terminate after some
> timeout (30s or so), and report number of ops per timeout.
> Anyway, as a short term fix, I am ok with it.
> Acked-by: Konstantin Ananyev <konstantin.anan...@intel.com>
Ok, thanks very much.

Best Regards
Feifei
> 
> 
> > 2.17.1

Reply via email to