Thanks for doing this! Some suggestions below. This no longer applies cleanly, and I also needed to add this incremental to make the testsuite pass:
diff --git a/tests/pmd.at b/tests/pmd.at index 3d219ff..9b6427e 100644 --- a/tests/pmd.at +++ b/tests/pmd.at @@ -144,10 +144,11 @@ dummy@ovs-dummy: hit:0 missed:0 p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=<cleared>, requested_rx_queues=4, requested_tx_queues=<cleared>) ]) -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 4], [0], [dnl +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl pmd thread numa_id <cleared> core_id <cleared>: emc hits:0 megaflow hits:0 + avg. subtable lookups per hit: 0.00 miss:0 lost:0 ]) @@ -170,10 +171,11 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl recirc_id(0),in_port(1),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del> ]) -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 4], [0], [dnl +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl pmd thread numa_id <cleared> core_id <cleared>: emc hits:19 megaflow hits:0 + avg. subtable lookups per hit: 0.00 miss:1 lost:0 ]) > On Jun 16, 2016, at 6:56 AM, Jan Scheurich <jan.scheur...@ericsson.com> wrote: > > The user-space datapath (dpif-netdev) consists of a first level "exact match > cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With > many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient > and the OVS forwarding performance is determined by the megaflow classifier. > > The megaflow classifier (dpcls) consists of a variable number of hash tables > (aka subtables), each containing megaflow entries with the same mask of > packet header and metadata fields to match upon. A dpcls lookup matches a > given packet against all subtables in sequence until it hits a match. As > megaflow cache entries are by construction non-overlapping, the first match > is the only match. > > Today the order of the subtables in the dpcls is essentially random so that > on average a dpcsl lookup has to visit N/2 subtables for a hit, when N is the > total number of subtables. Even though every single hash-table lookup is > fast, the performance of the current dpcls degrades when there are many > subtables. > > How does the patch address this issue: > > In reality there is often a strong correlation between the ingress port and a > small subset of subtables that have hits. The entire megaflow cache typically > decomposes nicely into partitions that are hit only by packets entering from > a range of similar ports (e.g. traffic from Phy -> VM vs. traffic from VM -> > Phy). > > Therefore, keeping a separate list of subtables per ingress port, sorted by > frequency of hits, reduces the average number of subtables lookups in the > dpcls to a minimum, even if the total number of subtables gets large. Did you consider keeping a separate dpcls instance per input port instead? I.e., have an hmap of dpcls'es using the in_port as the hash key, and creating and destroying them on demand. This would avoid the fixed size array, simplify the code and each insert and remove operation would be a bit cheaper. A miss would typically go though a lower number of subtables, even though the upcall cost would likely hide this benefit in practice. We are always exact matching in_port (as well as dl_type and recirc_id (see xlate_wc_init()), so this would add no overhead in terms of the overall number of megaflows. > > The patch introduces 32 subtable vectors per dpcls and hashes the ingress > port to select the subtable vector. The patch also counts matches per 32 > slots in each vector (hashing the subtable pointer to obtain the slot) and > sorts the vectors according to match frequency every second. > Also, it should be possible to expand the pvector API to allow using the pvector priorities directly as the counts. I posted a patch for this (https://patchwork.ozlabs.org/patch/638920/), after which you could apply the following incremental. I haven't tested how this performs, though. Jarno From: Jarno Rajahalme <ja...@ovn.org> Date: Tue, 21 Jun 2016 17:17:55 -0700 Subject: [PATCH] dpif-netdev: Use non-concurrent pvector. Cc: <ja...@ovn.org> PMD threads do not need the concurrent version of the pvector. The non-concurrent pvector entry priorities can be incremented in place and periodically sorted. Signed-off-by: Jarno Rajahalme <ja...@ovn.org> --- lib/dpif-netdev.c | 68 +++++++++++++++++++++++++++---------------------------- 1 file changed, 34 insertions(+), 34 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c491110..647bfb7 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -151,12 +151,11 @@ struct emc_cache { #define NUM_SUBTABLE_VECTORS 32 #define NUM_SUBTABLE_SLOTS 32 -#define DPCLS_OPTIMIATION_INTERVAL 1000 +#define DPCLS_OPTIMIZATION_INTERVAL 1000 struct dpcls { struct cmap subtables_map; - struct pvector subtables[NUM_SUBTABLE_VECTORS]; - uint64_t hit_cnt[NUM_SUBTABLE_VECTORS][NUM_SUBTABLE_SLOTS]; + struct pvector_impl *subtables[NUM_SUBTABLE_VECTORS]; long long int next_optimization; }; @@ -4371,11 +4370,10 @@ dpcls_init(struct dpcls *cls) int i; cmap_init(&cls->subtables_map); - for (i=0; i<NUM_SUBTABLE_VECTORS; i++) { - pvector_init(&cls->subtables[i]); + for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) { + cls->subtables[i] = pvector_impl_alloc(PVECTOR_EXTRA_ALLOC); } - memset(cls->hit_cnt, 0, sizeof cls->hit_cnt); - cls->next_optimization = time_msec() + DPCLS_OPTIMIATION_INTERVAL; + cls->next_optimization = time_msec() + DPCLS_OPTIMIZATION_INTERVAL; } static void @@ -4383,8 +4381,8 @@ dpcls_destroy_subtable(struct dpcls *cls, struct dpcls_subtable *subtable) { int i; - for (i=0; i<NUM_SUBTABLE_VECTORS; i++) { - pvector_remove(&cls->subtables[i], subtable); + for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) { + pvector_impl_remove(cls->subtables[i], subtable); } cmap_remove(&cls->subtables_map, &subtable->cmap_node, subtable->mask.hash); @@ -4407,8 +4405,8 @@ dpcls_destroy(struct dpcls *cls) dpcls_destroy_subtable(cls, subtable); } cmap_destroy(&cls->subtables_map); - for (i=0; i<NUM_SUBTABLE_VECTORS; i++) { - pvector_destroy(&cls->subtables[i]); + for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) { + free(cls->subtables[i]); } } } @@ -4417,7 +4415,6 @@ static struct dpcls_subtable * dpcls_create_subtable(struct dpcls *cls, const struct netdev_flow_key *mask) { struct dpcls_subtable *subtable; - int i; /* Need to add one. */ subtable = xmalloc(sizeof *subtable @@ -4425,10 +4422,10 @@ dpcls_create_subtable(struct dpcls *cls, const struct netdev_flow_key *mask) cmap_init(&subtable->rules); netdev_flow_key_clone(&subtable->mask, mask); cmap_insert(&cls->subtables_map, &subtable->cmap_node, mask->hash); - for (i=0; i<NUM_SUBTABLE_VECTORS; i++) { + + for (int i = 0; i < NUM_SUBTABLE_VECTORS; i++) { /* Insert new subtable with priority zero (no hits) */ - pvector_insert(&cls->subtables[i], subtable, 0); - pvector_publish(&cls->subtables[i]); + pvector_impl_push_back(&cls->subtables[i], subtable, 0); } return subtable; @@ -4456,22 +4453,22 @@ dpcls_try_optimize(struct dpcls *cls) long long int now = time_msec(); if (now > cls->next_optimization) { - struct dpcls_subtable *subtable; - int port_idx, subtable_idx; - - for (port_idx=0; port_idx<NUM_SUBTABLE_VECTORS; port_idx++) { - struct pvector *pvec = &cls->subtables[port_idx]; - PVECTOR_FOR_EACH (subtable, pvec) { - subtable_idx = hash_pointer(subtable, 210365) % NUM_SUBTABLE_SLOTS; - pvector_change_priority(pvec, subtable, - cls->hit_cnt[port_idx][subtable_idx]); + int port_idx; + + for (port_idx = 0; port_idx < NUM_SUBTABLE_VECTORS; port_idx++) { + struct pvector_impl *pvec = cls->subtables[port_idx]; + struct pvector_entry *entry; + int index; + + pvector_impl_sort(pvec); + + PVECTOR_IMPL_FOR_EACH_ENTRY (entry, index, pvec) { + entry->priority = 0; } - pvector_publish(pvec); } /* Start new measuring interval */ - memset(cls->hit_cnt, 0, sizeof cls->hit_cnt); - cls->next_optimization = now + DPCLS_OPTIMIATION_INTERVAL; + cls->next_optimization = now + DPCLS_OPTIMIZATION_INTERVAL; } } @@ -4500,8 +4497,9 @@ dpcls_remove(struct dpcls *cls, struct dpcls_rule *rule) if (cmap_remove(&subtable->rules, &rule->cmap_node, rule->flow.hash) == 0) { dpcls_destroy_subtable(cls, subtable); - for (i=0; i<NUM_SUBTABLE_VECTORS; i++) { - pvector_publish(&cls->subtables[i]); + + for (i = 0; i < NUM_SUBTABLE_VECTORS; i++) { + pvector_impl_sort(cls->subtables[i]); /* Removes gaps. */ } } } @@ -4559,10 +4557,11 @@ dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key keys[], /* Select the subtable vector for the in_port */ uint32_t port_idx = hash_port_no(port_no) % NUM_SUBTABLE_VECTORS; - uint64_t *hit_cnt = cls->hit_cnt[port_idx]; int lookups_match = 0, subtable_pos = 1; + struct pvector_entry *entry; + int index; - PVECTOR_FOR_EACH (subtable, &cls->subtables[port_idx]) { + PVECTOR_IMPL_FOR_EACH_ENTRY (entry, index, cls->subtables[port_idx]) { const struct netdev_flow_key *mkeys = keys; struct dpcls_rule **mrules = rules; map_type remains = 0; @@ -4570,6 +4569,8 @@ dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key keys[], BUILD_ASSERT_DECL(sizeof remains == sizeof *maps); + subtable = entry->ptr; + for (m = 0; m < N_MAPS; m++, mkeys += MAP_BITS, mrules += MAP_BITS) { uint32_t hashes[MAP_BITS]; const struct cmap_node *nodes[MAP_BITS]; @@ -4594,9 +4595,8 @@ dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key keys[], CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) { if (OVS_LIKELY(dpcls_rule_matches_key(rule, &mkeys[i]))) { mrules[i] = rule; - /* Update the subtable hit statistics for the in_port vector */ - int subtable_idx = hash_pointer(subtable, 210365) % NUM_SUBTABLE_SLOTS; - hit_cnt[subtable_idx]++; + /* Update the subtable hit stats. */ + entry->priority++; lookups_match += subtable_pos; goto next; } -- 2.1.4 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev