On Tue, Feb 25, 2020 at 7:11 PM Ananyev, Konstantin
<konstantin.anan...@intel.com> wrote:

We do have a run-time check in our current enqueue()/dequeue implementation.
In fact we support both modes: we have generic 
rte_ring_enqueue(/dequeue)_bulk(/burst)
where sync behaviour is determined at runtime by value of prod(/cons).single.
Or user can call  rte_ring_(mp/sp)_enqueue_* functions directly.
This RFC follows exactly the same paradigm:
rte_ring_enqueue(/dequeue)_bulk(/burst) kept generic and it's
behaviour is determined at runtime, by value of prod(/cons).sync_type.
Or user can call enqueue/dequeue with particular sync mode directly:
rte_ring_(mp/sp/rts/hts)_enqueue_(bulk/burst)*.
The only thing that changed:
  Format of prod/cons now could differ depending on mode selected at _init_.
  So you can't create a ring for let say SP mode and then in the middle of 
data-path
  change your mind and start using MP_RTS mode.
  For existing modes (SP/MP, SC/MC)  format remains the same and user can still
  use them interchangeably, though of course that is an error prone practice.

Makes sense.



But I agree with the problem statement that in the virtualization use
case, It may be possible to have N virtual cores runs on a physical
core.

IMO, The best solution would be keeping the ring API same and have a
different flavor in "compile-time". Something like
liburcu did for accommodating different flavors.

i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The
application can simply include ONE header file in a C file based on
the flavor.

I don't think it is a flexible enough approach.
In one app user might need to have several rings with different sync modes.
Or even user might need a ring with different sync modes for enqueue/dequeue.

Ack.


Yes, hiding rte_ring implementation inside .c would help a lot
in terms of ABI maintenance and would make our future life easier.
The question is what is the price for it in terms of performance,
and are we ready to pay it. Not to mention that it would cause
changes in many other libs/apps...
So I think it should be a subject for a separate discussion.
But, agree it would be good at least to measure the performance
impact of such change.
If I'll have some spare cycles, will give it a try.
Meanwhile, can I ask Jerin and other guys to repeat tests from this RFC
on their HW? Before continuing discussion would probably be good to know
does the suggested patch work as expected across different platforms.


I tested on an arm64 HW. The former section is without the
patch(20.02) and later one with this patch.
I agree with Konstantin that getting more platform tests will be good
early so that we can focus on the approach
to avoid back and forth latter.


RTE>>ring_perf_autotest // without path

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 289.78
legacy APIs: MP/MC: single: 516.20

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 312.88
legacy APIs: SP/SC: burst (size: 32): 426.72
legacy APIs: MP/MC: burst (size: 8): 510.95
legacy APIs: MP/MC: burst (size: 32): 702.01

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 306.74
legacy APIs: SP/SC: bulk (size: 32): 411.56
legacy APIs: MP/MC: bulk (size: 8): 501.32
legacy APIs: MP/MC: bulk (size: 32): 693.07

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.00
legacy APIs: MP/MC: bulk (size: 8): 7.00

### Testing using two physical cores ###
legacy APIs: SP/SC: bulk (size: 8): 74.36
legacy APIs: MP/MC: bulk (size: 8): 110.18
legacy APIs: SP/SC: bulk (size: 32): 23.04
legacy APIs: MP/MC: bulk (size: 32): 32.29

### Testing using all slave nodes ##
Bulk enq/dequeue count on size 8
Core [8] count = 293741
Core [9] count = 293741
Total count (size: 8): 587482

Bulk enq/dequeue count on size 32
Core [8] count = 244909
Core [9] count = 244909
Total count (size: 32): 1077300

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 255.37
elem APIs: element size 16B: MP/MC: single: 456.68

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 291.99
elem APIs: element size 16B: SP/SC: burst (size: 32): 456.25
elem APIs: element size 16B: MP/MC: burst (size: 8): 497.77
elem APIs: element size 16B: MP/MC: burst (size: 32): 680.87

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 284.40
elem APIs: element size 16B: SP/SC: bulk (size: 32): 453.17
elem APIs: element size 16B: MP/MC: bulk (size: 8): 485.77
elem APIs: element size 16B: MP/MC: bulk (size: 32): 675.08

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 8.00
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.00

### Testing using two physical cores ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 74.45
elem APIs: element size 16B: MP/MC: bulk (size: 8): 105.91
elem APIs: element size 16B: SP/SC: bulk (size: 32): 22.92
elem APIs: element size 16B: MP/MC: bulk (size: 32): 31.55

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [8] count = 308724
Core [9] count = 308723
Total count (size: 8): 617447

Bulk enq/dequeue count on size 32
Core [8] count = 214269
Core [9] count = 214269
Total count (size: 32): 1045985

RTE>>ring_perf_autotest // with patch

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 289.78
legacy APIs: MP/MC: single: 475.76

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 323.91
legacy APIs: SP/SC: burst (size: 32): 424.60
legacy APIs: MP/MC: burst (size: 8): 523.00
legacy APIs: MP/MC: burst (size: 32): 717.09

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 317.74
legacy APIs: SP/SC: bulk (size: 32): 413.57
legacy APIs: MP/MC: bulk (size: 8): 512.89
legacy APIs: MP/MC: bulk (size: 32): 712.45

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.00
legacy APIs: MP/MC: bulk (size: 8): 7.00

### Testing using two physical cores ###
legacy APIs: SP/SC: bulk (size: 8): 74.82
legacy APIs: MP/MC: bulk (size: 8): 96.45
legacy APIs: SP/SC: bulk (size: 32): 22.97
legacy APIs: MP/MC: bulk (size: 32): 32.52

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [8] count = 283928
Core [9] count = 283927
Total count (size: 8): 567855

Bulk enq/dequeue count on size 32
Core [8] count = 223916
Core [9] count = 223915
Total count (size: 32): 1015686

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 267.65
elem APIs: element size 16B: MP/MC: single: 439.06

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 302.44
elem APIs: element size 16B: SP/SC: burst (size: 32): 466.31
elem APIs: element size 16B: MP/MC: burst (size: 8): 502.51
elem APIs: element size 16B: MP/MC: burst (size: 32): 695.81

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 295.15
elem APIs: element size 16B: SP/SC: bulk (size: 32): 462.77
elem APIs: element size 16B: MP/MC: bulk (size: 8): 496.89
elem APIs: element size 16B: MP/MC: bulk (size: 32): 690.46

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.50
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.44

### Testing using two physical cores ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 65.85
elem APIs: element size 16B: MP/MC: bulk (size: 8): 103.80
elem APIs: element size 16B: SP/SC: bulk (size: 32): 23.27
elem APIs: element size 16B: MP/MC: bulk (size: 32): 31.17

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [8] count = 304223
Core [9] count = 304221
Total count (size: 8): 608444

Bulk enq/dequeue count on size 32
Core [8] count = 214856
Core [9] count = 214855
Total count (size: 32): 1038155
Test OK
RTE>>quit



Encountered a couple of different build errors with these patches on my Power 9 system:

In file included from ../lib/librte_ring/rte_ring.h:534,
                 from ../drivers/mempool/ring/rte_mempool_ring.c:9:
../lib/librte_ring/rte_ring_hts_generic.h: In function ‘__rte_ring_hts_update_tail’: ../lib/librte_ring/rte_ring_hts_generic.h:61:2: warning: implicit declaration of function ‘RTE_ASSERT’; did you mean ‘RTE_STR’? [-Wimplicit-function-declaration]
  RTE_ASSERT(n >= num);
  ^~~~~~~~~~
  RTE_STR

Fixed by adding "#include <rte_debug.h>" to rte_ring.h.

Also encountered:

In file included from ../app/test/test_ring_hts_stress.c:5:
../app/test/test_ring_stress.h: In function ‘check_updt_elem’:
../app/test/test_ring_stress.h:162:9: error: unknown type name ‘rte_spinlock_t’
  static rte_spinlock_t dump_lock;
         ^~~~~~~~~~~~~~
../app/test/test_ring_stress.h:166:4: warning: implicit declaration of function ‘rte_spinlock_lock’; did you mean ‘rte_calloc_socket’? [-Wimplicit-function-declaration]
    rte_spinlock_lock(&dump_lock);
    ^~~~~~~~~~~~~~~~~
    rte_calloc_socket
../app/test/test_ring_stress.h:166:4: warning: nested extern declaration of ‘rte_spinlock_lock’ [-Wnested-externs] ../app/test/test_ring_stress.h:172:4: warning: implicit declaration of function ‘rte_spinlock_unlock’; did you mean ‘pthread_rwlock_unlock’? [-Wimplicit-function-declaration]
    rte_spinlock_unlock(&dump_lock);
    ^~~~~~~~~~~~~~~~~~~
    pthread_rwlock_unlock
../app/test/test_ring_stress.h:172:4: warning: nested extern declaration of ‘rte_spinlock_unlock’ [-Wnested-externs]

Fixed by adding "#include <rte_spinlock.h>" to test_ring_stress.h.

Autoperf test results
---------------------
RTE>>ring_perf_autotest // DPDK 20.02, without patch

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.14
legacy APIs: MP/MC: single: 56.26

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.59
legacy APIs: SP/SC: burst (size: 32): 49.87
legacy APIs: MP/MC: burst (size: 8): 58.43
legacy APIs: MP/MC: burst (size: 32): 65.68

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.59
legacy APIs: SP/SC: bulk (size: 32): 49.85
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.60

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.46
legacy APIs: MP/MC: bulk (size: 8): 16.20
legacy APIs: SP/SC: bulk (size: 32): 3.21
legacy APIs: MP/MC: bulk (size: 32): 3.73

### Testing using two physical cores ###
legacy APIs: SP/SC: bulk (size: 8): 33.34
legacy APIs: MP/MC: bulk (size: 8): 37.99
legacy APIs: SP/SC: bulk (size: 32): 10.19
legacy APIs: MP/MC: bulk (size: 32): 11.90

### Testing using two NUMA nodes ###
legacy APIs: SP/SC: bulk (size: 8): 49.50
legacy APIs: MP/MC: bulk (size: 8): 63.65
legacy APIs: SP/SC: bulk (size: 32): 12.49
legacy APIs: MP/MC: bulk (size: 32): 23.53

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5604
Core [5] count = 5563
Core [6] count = 5576
Core [7] count = 5630
Core [8] count = 5643
Core [9] count = 5727
Core [10] count = 5698
Core [11] count = 5711
Core [64] count = 5259
Core [65] count = 5322
Core [66] count = 5321
Core [67] count = 5310
Core [68] count = 4350
Core [69] count = 4455
Core [70] count = 4546
Core [71] count = 4475
Total count (size: 8): 84190

Bulk enq/dequeue count on size 32
Core [4] count = 5543
Core [5] count = 5555
Core [6] count = 5596
Core [7] count = 5584
Core [8] count = 5613
Core [9] count = 5686
Core [10] count = 5689
Core [11] count = 5677
Core [64] count = 5228
Core [65] count = 5389
Core [66] count = 5406
Core [67] count = 5359
Core [68] count = 4554
Core [69] count = 4673
Core [70] count = 4675
Core [71] count = 4644
Total count (size: 32): 169061

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.84
elem APIs: element size 16B: MP/MC: single: 56.77

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 44.98
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.16
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.58
elem APIs: element size 16B: MP/MC: burst (size: 32): 74.75

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.01
elem APIs: element size 16B: SP/SC: bulk (size: 32): 59.08
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.58
elem APIs: element size 16B: MP/MC: bulk (size: 32): 74.76

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.18
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.44
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.22
elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.97

### Testing using two physical cores ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 42.07
elem APIs: element size 16B: MP/MC: bulk (size: 8): 44.50
elem APIs: element size 16B: SP/SC: bulk (size: 32): 10.73
elem APIs: element size 16B: MP/MC: bulk (size: 32): 11.73

### Testing using two NUMA nodes ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 49.55
elem APIs: element size 16B: MP/MC: bulk (size: 8): 93.10
elem APIs: element size 16B: SP/SC: bulk (size: 32): 12.33
elem APIs: element size 16B: MP/MC: bulk (size: 32): 27.10

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5489
Core [5] count = 5559
Core [6] count = 5566
Core [7] count = 5577
Core [8] count = 5645
Core [9] count = 5699
Core [10] count = 5695
Core [11] count = 5733
Core [64] count = 5202
Core [65] count = 5284
Core [66] count = 5319
Core [67] count = 5349
Core [68] count = 4331
Core [69] count = 4465
Core [70] count = 4484
Core [71] count = 4439
Total count (size: 8): 83836

Bulk enq/dequeue count on size 32
Core [4] count = 5567
Core [5] count = 5492
Core [6] count = 5517
Core [7] count = 5515
Core [8] count = 5593
Core [9] count = 5650
Core [10] count = 5694
Core [11] count = 5665
Core [64] count = 5236
Core [65] count = 5319
Core [66] count = 5333
Core [67] count = 5304
Core [68] count = 4608
Core [69] count = 4669
Core [70] count = 4690
Core [71] count = 4654
Total count (size: 32): 168342
Test OK
---------------------
RTE>>ring_perf_autotest // DPDK 20.02, without patch

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.18
legacy APIs: MP/MC: single: 56.26

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.60
legacy APIs: SP/SC: burst (size: 32): 49.86
legacy APIs: MP/MC: burst (size: 8): 58.43
legacy APIs: MP/MC: burst (size: 32): 65.67

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.59
legacy APIs: SP/SC: bulk (size: 32): 49.86
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.63

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.07
legacy APIs: MP/MC: bulk (size: 8): 16.24
legacy APIs: SP/SC: bulk (size: 32): 3.20
legacy APIs: MP/MC: bulk (size: 32): 3.72

### Testing using two physical cores ###
legacy APIs: SP/SC: bulk (size: 8): 33.41
legacy APIs: MP/MC: bulk (size: 8): 38.01
legacy APIs: SP/SC: bulk (size: 32): 10.23
legacy APIs: MP/MC: bulk (size: 32): 11.90

### Testing using two NUMA nodes ###
legacy APIs: SP/SC: bulk (size: 8): 49.27
legacy APIs: MP/MC: bulk (size: 8): 64.80
legacy APIs: SP/SC: bulk (size: 32): 12.45
legacy APIs: MP/MC: bulk (size: 32): 23.11

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5637
Core [5] count = 5599
Core [6] count = 5623
Core [7] count = 5627
Core [8] count = 5723
Core [9] count = 5758
Core [10] count = 5714
Core [11] count = 5724
Core [64] count = 5310
Core [65] count = 5438
Core [66] count = 5448
Core [67] count = 5374
Core [68] count = 4441
Core [69] count = 4550
Core [70] count = 4550
Core [71] count = 4558
Total count (size: 8): 85074

Bulk enq/dequeue count on size 32
Core [4] count = 5608
Core [5] count = 5623
Core [6] count = 5590
Core [7] count = 5658
Core [8] count = 5680
Core [9] count = 5738
Core [10] count = 5692
Core [11] count = 5712
Core [64] count = 5273
Core [65] count = 5363
Core [66] count = 5341
Core [67] count = 5349
Core [68] count = 4591
Core [69] count = 4673
Core [70] count = 4698
Core [71] count = 4687
Total count (size: 32): 170350

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.82
elem APIs: element size 16B: MP/MC: single: 56.79

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 44.99
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.00
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.59
elem APIs: element size 16B: MP/MC: burst (size: 32): 74.78

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 44.97
elem APIs: element size 16B: SP/SC: bulk (size: 32): 58.91
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.60
elem APIs: element size 16B: MP/MC: bulk (size: 32): 74.61

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.18
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.41
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.19
elem APIs: element size 16B: MP/MC: bulk (size: 32): 4.06

### Testing using two physical cores ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 42.08
elem APIs: element size 16B: MP/MC: bulk (size: 8): 44.52
elem APIs: element size 16B: SP/SC: bulk (size: 32): 10.73
elem APIs: element size 16B: MP/MC: bulk (size: 32): 12.39

### Testing using two NUMA nodes ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 49.65
elem APIs: element size 16B: MP/MC: bulk (size: 8): 93.27
elem APIs: element size 16B: SP/SC: bulk (size: 32): 12.38
elem APIs: element size 16B: MP/MC: bulk (size: 32): 27.19

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5629
Core [5] count = 5585
Core [6] count = 5676
Core [7] count = 5604
Core [8] count = 5639
Core [9] count = 5731
Core [10] count = 5694
Core [11] count = 5707
Core [64] count = 5254
Core [65] count = 5331
Core [66] count = 5340
Core [67] count = 5355
Core [68] count = 4339
Core [69] count = 4481
Core [70] count = 4504
Core [71] count = 4507
Total count (size: 8): 84376

Bulk enq/dequeue count on size 32
Core [4] count = 5518
Core [5] count = 5493
Core [6] count = 5559
Core [7] count = 5484
Core [8] count = 5623
Core [9] count = 5669
Core [10] count = 5661
Core [11] count = 5658
Core [64] count = 5207
Core [65] count = 5305
Core [66] count = 5273
Core [67] count = 5303
Core [68] count = 4542
Core [69] count = 4682
Core [70] count = 4672
Core [71] count = 4609
Total count (size: 32): 168634
Test OK


Dave

Reply via email to