Re: [dpdk-dev] [PATCH v2] examples/distributor: detect high frequency cores

Hunt, David Thu, 28 Mar 2019 08:21:56 -0700


On 28/3/2019 3:10 PM, Burakov, Anatoly wrote:

On 28-Mar-19 2:42 PM, Hunt, David wrote:
On 28/3/2019 1:58 PM, Burakov, Anatoly wrote:
On 28-Mar-19 1:13 PM, David Hunt wrote:
The distributor application is bottlenecked by the distributor core,
so if we can give more frequency to this core, then the overall
performance of the application may increase.

This patch uses the rte_power_get_capabilities() API to query the
cores provided in the core mask, and if any high frequency cores are
found (e.g. Turbo Boost is enabled), we will pin the distributor
workload to that core.

Signed-off-by: Liang Ma <[email protected]>
Signed-off-by: David Hunt <[email protected]>
---
<...>
+    if (power_lib_initialised)
+        rte_power_exit(rte_lcore_id());
      printf("\nCore %u exiting tx task.\n", rte_lcore_id());
      return 0;
  }
@@ -575,9 +582,35 @@ lcore_worker(struct lcore_params *p)
          if (num > 0)
app_stats.worker_bursts[p->worker_id][num-1]++;
      }
+    if (power_lib_initialised)
+        rte_power_exit(rte_lcore_id());
+    rte_free(p);
      return 0;
  }
  +static int
+init_power_library(void)
+{
+    int ret = 0, lcore_id;
+    RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+        if (rte_lcore_is_enabled(lcore_id)) {
Please correct me if i'm wrong, but RTE_LCORE_FOREACH_SLAVE alreadychecks if the lcore is enabled.
You're correct, I'll fix in next version.
<...>
  +    if (power_lib_initialised) {
+        /*
+         * Here we'll pre-assign lcore ids to the rx, tx and
+         * distributor workloads if there's higher frequency
+         * on those cores e.g. if Turbo Boost is enabled.
+         * It's also worth mentioning that it will assign cores in a
+         * specific order, so that if there's less than three
+         * available, the higher frequency cores will go to the
+         * distributor first, then rx, then tx.
+         */
+        RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+
+            rte_power_get_capabilities(lcore_id, &lcore_cap);
+
+            if (lcore_cap.turbo == 1) {
+                priority_num++;
+                switch (priority_num) {
+                case 1:
+                    distr_core_id = lcore_id;
+                    printf("Distributor on priority core %d\n",
+                            lcore_id);
+                    break;
+                case 2:
+                    rx_core_id = lcore_id;
+                    printf("Rx on priority core %d\n",
+                            lcore_id);
+                    break;
+                case 3:
+                    tx_core_id = lcore_id;
+                    printf("Tx on priority core %d\n",
+                            lcore_id);
+                    break;
+                default:
+                    break;
+                }
This seems to be doing the same thing as right below (assigninglcore id's in order), yet in one case you use a switch, and in theother you use a simple loop. I don't see priority_num used anywhereelse, so you might as well simplify this loop to be similar to whatyou have below, with "skip-if-not-turbo, if not assigned,assign-and-continue" type flow.
There doing different things. The loop with the switch is looking forup to three priority cores, and storing those choices indistr_core_id, tx_core_id and rx_core_id. This is because we don'tknow which are the priority cores ahead of time. priority_num is usedin the switch statement, and when it finds a priority core, itincrements, so we know which variable to assign with the nextavailable priority core. Imagine we have turbo enabled on cores 2,4and 6. That's what I'm trying to solve here.
Then, when we get to the next loop, we're just assigning thenon-priority cores if the three key workloads have not already beenassigned a core, hence the simple loop, using the remaining cores.
I looked at simplifying the flow, but as far as I can see, I need twostages, a 'discovery' for the priority cores first, then whatever isleft can be done in a normal loop.
Does that make sense, or my I missing an obvious refactor opportunity?
I don't see how this is different from what you're doing below.
You are looping over cores, checking if it's a priority core, andassigning any priority cores found to distributor, Rx and Tx cores, inthat order.
Below, you're looping over cores, checking if the core is alreadyassigned, and assigning these cores to distributor, Rx and Tx cores,in that order.
So, the only tangible difference between the two is 1) the check forwhether the cores are already assigned (you don't need that becausethese cores *cannot* be attached - you haven't looped over them yet!),and 2) check for whether the core is priority.
Just to clarify: i'm not saying merge the two loops, that can't work:) I'm saying, drop the switch and rewrite it like this:


Ah, gocha now. Will do in next rev. Thanks!

for (cores) {
    if (core not priority)
        continue;
    if (dist_core not assigned) {
        assign dist core;
        continue;
    }
    if (rx core not assigned) {
        assign rx core;
        continue;
    }
    if (tx core not assigned) {
        assign tx core;
        continue;
    }
}

The functionality would be equivalent to the current switch method,but the code would be much clearer :)


Once that is fixed,

Reviewed-by: Anatoly Burakov <[email protected]>

Re: [dpdk-dev] [PATCH v2] examples/distributor: detect high frequency cores

Reply via email to