Optimize service loop so that the starting point is the lowest-indexed
service mapped to the lcore in question, and terminate the loop at the
highest-indexed service.

While the worst case latency remains the same, this patch
significantly reduces the service framework overhead for the average
case. In particular, scenarios where an lcore only runs a single
service, or multiple services which id values are close (e.g., three
services with ids 17, 18 and 22), show significant improvements.

The worse case is a where the lcore two services mapped to it; one
with service id 0 and the other with id 63.

On a service lcore serving a single service, the service loop overhead
is reduced from ~190 core clock cycles to ~46. (On an Intel Cascade
Lake generation Xeon.) On weakly ordered CPUs, the gain is larger,
since the loop included load-acquire atomic operations.

Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
---
 lib/eal/common/rte_service.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index 87df04e3ac..4cac866792 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -464,7 +464,6 @@ static int32_t
 service_runner_func(void *arg)
 {
        RTE_SET_USED(arg);
-       uint32_t i;
        const int lcore = rte_lcore_id();
        struct core_state *cs = &lcore_states[lcore];
 
@@ -478,10 +477,17 @@ service_runner_func(void *arg)
                        RUNSTATE_RUNNING) {
 
                const uint64_t service_mask = cs->service_mask;
+               uint8_t start_id;
+               uint8_t end_id;
+               uint8_t i;
 
-               for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) {
-                       if (!service_registered(i))
-                               continue;
+               if (service_mask == 0)
+                       continue;
+
+               start_id = __builtin_ctzl(service_mask);
+               end_id = 64 - __builtin_clzl(service_mask);
+
+               for (i = start_id; i < end_id; i++) {
                        /* return value ignored as no change to code flow */
                        service_run(i, cs, service_mask, service_get(i), 1);
                }
-- 
2.34.1

Reply via email to