From: Robin Jarry <[email protected]> Date: Tuesday, 19 May 2026 at 3:43 PM To: [email protected] <[email protected]>; Jerin Jacob <[email protected]>; Kiran Kumar Kokkilagadda <[email protected]>; Nithin Kumar Dabilpuram <[email protected]>; Zhirun Yan <[email protected]> Cc: Vladimir Medvedkin <[email protected]>; Christophe Fontaine <[email protected]>; David Marchand <[email protected]>; Konstantin Ananyev <[email protected]>; Maxime Leroy <[email protected]> Subject: [EXTERNAL] [PATCH dpdk] graph: replace circular buffer with priority-based bitmap scheduling
Replace the FIFO circular buffer used to track pending nodes with a bitmap and a priority-sorted schedule table. Each node can now have a scheduling priority (int16_t, default 0, lower value means visited first). Source nodes are forced to INT16_MIN so they always run first. At graph creation time, nodes are sorted by (priority, id) and assigned a bit position (sched_idx). During the walk, a bitmap tracks which nodes have pending objects. Scanning from the lowest bit naturally visits nodes in priority order. This improves batching in fan-out-then-converge topologies. When eth_input classifies packets to both mpls_input and ipv4_input, the old FIFO order could process ipv4_input before mpls_input, causing ipv4_input to be visited twice (once before and once after the MPLS label is popped). With mpls_input at a higher priority (lower value), it runs first and its output accumulates in ipv4_input which is then visited only once with all packets. The bitmap set operation is idempotent (OR on an already-set bit is a no-op) which removes the need for the idx == 0 guards that the circular buffer required to avoid duplicate enqueue. Suggested-by: Vladimir Medvedkin <[email protected]> Signed-off-by: Robin Jarry <[email protected]> Cc: Christophe Fontaine <[email protected]> Cc: David Marchand <[email protected]> Cc: Konstantin Ananyev <[email protected]> Cc: Maxime Leroy <[email protected]> --- doc/guides/prog_guide/graph_lib.rst | 37 +- .../prog_guide/img/graph_mem_layout.svg | 1823 +++++++---------- lib/graph/graph.c | 19 +- lib/graph/graph_debug.c | 12 +- lib/graph/graph_populate.c | 117 +- lib/graph/graph_private.h | 27 +- lib/graph/node.c | 2 + lib/graph/rte_graph.h | 1 + lib/graph/rte_graph_model_mcore_dispatch.h | 34 +- lib/graph/rte_graph_model_rtc.h | 65 +- lib/graph/rte_graph_worker.h | 2 +- lib/graph/rte_graph_worker_common.h | 81 +- 12 files changed, 984 insertions(+), 1236 deletions(-) diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst index 8409e7666e85..9c6d8679b686 100644 --- a/doc/guides/prog_guide/graph_lib.rst +++ b/doc/guides/prog_guide/graph_lib.rst @@ -117,13 +117,22 @@ next_node[]: The dynamic array to store the downstream nodes connected to this node. Downstream node should not be current node itself or a source node. +priority: +^^^^^^^^^ + +The scheduling priority of the node (``int16_t``, default 0). Nodes with lower +priority values are visited first during the graph walk. This allows control +over the order in which pending nodes are processed, which can improve packet +batching in topologies where multiple paths converge on the same node. + Source node: ^^^^^^^^^^^^ Source nodes are static nodes created using ``RTE_NODE_REGISTER`` by passing ``flags`` as ``RTE_NODE_SOURCE_F``. -While performing the graph walk, the ``process()`` function of all the source -nodes will be called first. So that these nodes can be used as input nodes for a graph. +Source nodes are automatically assigned the lowest possible priority +(``INT16_MIN``) so that their ``process()`` function is always called first +during the graph walk. This ensures they act as input nodes for a graph. nb_xstats: ^^^^^^^^^^ @@ -396,12 +405,26 @@ Graph object memory layout Understanding the memory layout helps to debug the graph library and improve the performance if needed. -Graph object consists of a header, circular buffer to store the pending stream -when walking over the graph, variable-length memory to store the ``rte_node`` objects, -and variable-length memory to store the xstat reported by each ``rte_node``. +A graph object consists of a header, a scheduling table mapping bit positions to +node offsets, pending and source bitmaps for tracking which nodes need +processing, variable-length memory to store the ``rte_node`` objects, and +variable-length memory to store the xstat reported by each ``rte_node``. -The graph_nodes_mem_create() creates and populate this memory. The functions -such as ``rte_graph_walk()`` and ``rte_node_enqueue_*`` use this memory +Nodes are sorted by ``(priority, node_id)`` at graph creation time and each +node is assigned a bit position in the pending bitmap. During the graph walk, +the bitmap is scanned from the lowest bit, so nodes with lower priority values +are visited first. Source nodes are always assigned the lowest priority +(``INT16_MIN``) to ensure they run before any processing node. + +This priority-based ordering improves batching in fan-out-then-converge +topologies. For example, if ``eth_input`` classifies packets to both +``mpls_input`` and ``ipv4_input``, giving ``mpls_input`` a lower priority value +ensures it runs first. Its output accumulates in ``ipv4_input`` which is then +visited only once with all packets, instead of being visited twice (before and +after MPLS label popping). + +The ``graph_fp_mem_create()`` function creates and populates this memory. The +functions such as ``rte_graph_walk()`` and ``rte_node_enqueue_*`` use this memory to enable fastpath services. diff --git a/lib/graph/graph.c b/lib/graph/graph.c index 6911ea8abeed..6dc1402e6bd0 100644 --- a/lib/graph/graph.c +++ b/lib/graph/graph.c @@ -334,20 +334,6 @@ graph_mem_fixup_secondary(struct rte_graph *graph) return graph_mem_fixup_node_ctx(graph); } -static bool -graph_src_node_avail(struct graph *graph) -{ - struct graph_node *graph_node; - - STAILQ_FOREACH(graph_node, &graph->node_list, next) - if ((graph_node->node->flags & RTE_NODE_SOURCE_F) && - (graph_node->node->lcore_id == RTE_MAX_LCORE || - graph->lcore_id == graph_node->node->lcore_id)) - return true; - - return false; -} - RTE_EXPORT_SYMBOL(rte_graph_model_mcore_dispatch_core_bind) int rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore) @@ -375,9 +361,8 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore) graph->graph->dispatch.lcore_id = graph->lcore_id; graph->socket = rte_lcore_to_socket_id(lcore); - /* check the availability of source node */ - if (!graph_src_node_avail(graph)) - graph->graph->head = 0; + /* Rebuild source bitmap with only source nodes bound to this lcore */ + graph_src_bitmap_rebuild(graph); return 0; diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c index e3b8cccdc1f0..8e99fa1b0fb8 100644 --- a/lib/graph/graph_debug.c +++ b/lib/graph/graph_debug.c @@ -15,8 +15,8 @@ graph_dump(FILE *f, struct graph *g) fprintf(f, "graph <%s>\n", g->name); fprintf(f, " id=%" PRIu32 "\n", g->id); - fprintf(f, " cir_start=%" PRIu32 "\n", g->cir_start); - fprintf(f, " cir_mask=%" PRIu32 "\n", g->cir_mask); + fprintf(f, " sched_table_off=%" PRIu32 "\n", g->sched_table_off); + fprintf(f, " nb_sched_words=%" PRIu16 "\n", g->nb_sched_words); fprintf(f, " addr=%p\n", g); fprintf(f, " graph=%p\n", g->graph); fprintf(f, " mem_sz=%zu\n", g->mem_sz); @@ -63,14 +63,14 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all) fprintf(f, "graph <%s> @ %p\n", g->name, g); fprintf(f, " id=%" PRIu32 "\n", g->id); - fprintf(f, " head=%" PRId32 "\n", (int32_t)g->head); - fprintf(f, " tail=%" PRId32 "\n", (int32_t)g->tail); - fprintf(f, " cir_mask=0x%" PRIx32 "\n", g->cir_mask); fprintf(f, " nb_nodes=%" PRId32 "\n", g->nb_nodes); + fprintf(f, " nb_sched_words=%" PRIu16 "\n", g->nb_sched_words); fprintf(f, " socket=%d\n", g->socket); fprintf(f, " fence=0x%" PRIx64 "\n", g->fence); fprintf(f, " nodes_start=0x%" PRIx32 "\n", g->nodes_start); - fprintf(f, " cir_start=%p\n", g->cir_start); + fprintf(f, " sched_table=%p\n", g->sched_table); + fprintf(f, " pending=%p\n", g->pending); + fprintf(f, " src_pending=%p\n", g->src_pending); rte_graph_foreach_node(count, off, g, n) { if (!all && n->idx == 0) diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c index 026daecb2122..45bc7704fede 100644 --- a/lib/graph/graph_populate.c +++ b/lib/graph/graph_populate.c @@ -3,6 +3,8 @@ */ +#include <stdlib.h> + #include <rte_common.h> #include <rte_errno.h> #include <rte_malloc.h> @@ -15,19 +17,27 @@ static size_t graph_fp_mem_calc_size(struct graph *graph) { struct graph_node *graph_node; - rte_node_t val; + uint16_t nwords; size_t sz; /* Graph header */ sz = sizeof(struct rte_graph); - /* Source nodes list */ - sz += sizeof(rte_graph_off_t) * graph->src_node_count; - /* Circular buffer for pending streams of size number of nodes */ - val = rte_align32pow2(graph->node_count * sizeof(rte_graph_off_t)); - sz = RTE_ALIGN(sz, val); - graph->cir_start = sz; - graph->cir_mask = rte_align32pow2(graph->node_count) - 1; - sz += val; + + /* Schedule table: node offset indexed by sched_idx */ + sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE); + graph->sched_table_off = sz; + sz += sizeof(rte_graph_off_t) * graph->node_count; + + /* Pending and source pending bitmaps */ + nwords = (graph->node_count + 63) / 64; + graph->nb_sched_words = nwords; + sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE); + graph->pending_off = sz; + sz += sizeof(uint64_t) * nwords; + sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE); + graph->src_pending_off = sz; + sz += sizeof(uint64_t) * nwords; + /* Fence */ sz += sizeof(RTE_GRAPH_FENCE); sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE); @@ -54,20 +64,44 @@ graph_fp_mem_calc_size(struct graph *graph) } static void -graph_header_popluate(struct graph *_graph) +graph_header_populate(struct graph *_graph) { struct rte_graph *graph = _graph->graph; - graph->tail = 0; - graph->head = (int32_t)-_graph->src_node_count; - graph->cir_mask = _graph->cir_mask; graph->nb_nodes = _graph->node_count; - graph->cir_start = RTE_PTR_ADD(graph, _graph->cir_start); + graph->nb_sched_words = _graph->nb_sched_words; + graph->sched_table = RTE_PTR_ADD(graph, _graph->sched_table_off); + graph->pending = RTE_PTR_ADD(graph, _graph->pending_off); + graph->src_pending = RTE_PTR_ADD(graph, _graph->src_pending_off); graph->nodes_start = _graph->nodes_start; graph->socket = _graph->socket; graph->id = _graph->id; memcpy(graph->name, _graph->name, RTE_GRAPH_NAMESIZE); graph->fence = RTE_GRAPH_FENCE; + + memset(graph->pending, 0, sizeof(uint64_t) * _graph->nb_sched_words); + memset(graph->src_pending, 0, sizeof(uint64_t) * _graph->nb_sched_words); +} + +static int16_t +graph_node_effective_priority(const struct graph_node *gn) +{ + if (gn->node->flags & RTE_NODE_SOURCE_F) + return INT16_MIN; + return gn->node->priority; +} + +static int +graph_node_priority_cmp(const void *a, const void *b) +{ + const struct graph_node *const *na = a; + const struct graph_node *const *nb = b; + int16_t pa = graph_node_effective_priority(*na); + int16_t pb = graph_node_effective_priority(*nb); + + if (pa != pb) + return (int)pa - (int)pb; + return (int)(*na)->node->id - (int)(*nb)->node->id; } static void @@ -76,15 +110,26 @@ graph_nodes_populate(struct graph *_graph) rte_graph_off_t xstat_off = _graph->xstats_start; rte_graph_off_t off = _graph->nodes_start; struct rte_graph *graph = _graph->graph; - struct graph_node *graph_node; + struct graph_node **sorted, *graph_node; rte_edge_t count, nb_edges; rte_node_t pid; + uint32_t n; - STAILQ_FOREACH(graph_node, &_graph->node_list, next) { + /* Build a sorted array of graph_node pointers by (priority, id) */ + sorted = calloc(_graph->node_count, sizeof(*sorted)); + RTE_VERIFY(sorted != NULL); + n = 0; + STAILQ_FOREACH(graph_node, &_graph->node_list, next) + sorted[n++] = graph_node; + qsort(sorted, n, sizeof(*sorted), graph_node_priority_cmp); + + for (n = 0; n < _graph->node_count; n++) { + graph_node = sorted[n]; struct rte_node *node = RTE_PTR_ADD(graph, off); memset(node, 0, sizeof(*node)); node->fence = RTE_GRAPH_FENCE; node->off = off; + node->sched_idx = n; if (graph_pcap_is_enable()) { node->process = graph_pcap_dispatch; node->original_process = graph_node->node->process; @@ -123,8 +168,14 @@ graph_nodes_populate(struct graph *_graph) off += sizeof(struct rte_node *) * nb_edges; off = RTE_ALIGN(off, RTE_CACHE_LINE_SIZE); node->next = off; + + /* Fill the schedule table */ + graph->sched_table[n] = node->off; + __rte_node_stream_alloc(graph, node); } + + free(sorted); } struct rte_node * @@ -179,12 +230,11 @@ graph_node_nexts_populate(struct graph *_graph) } static int -graph_src_nodes_offset_populate(struct graph *_graph) +graph_src_bitmap_populate(struct graph *_graph) { struct rte_graph *graph = _graph->graph; struct graph_node *graph_node; struct rte_node *node; - int32_t head = -1; const char *name; STAILQ_FOREACH(graph_node, &_graph->node_list, next) { @@ -195,7 +245,7 @@ graph_src_nodes_offset_populate(struct graph *_graph) SET_ERR_JMP(EINVAL, fail, "%s not found", name); __rte_node_stream_alloc(graph, node); - graph->cir_start[head--] = node->off; + __rte_node_pending_set(graph->src_pending, node); } } @@ -204,17 +254,42 @@ graph_src_nodes_offset_populate(struct graph *_graph) return -rte_errno; } +void +graph_src_bitmap_rebuild(struct graph *_graph) +{ + struct rte_graph *graph = _graph->graph; + struct graph_node *graph_node; + struct rte_node *node; + const char *name; + + memset(graph->src_pending, 0, + sizeof(uint64_t) * graph->nb_sched_words); + + STAILQ_FOREACH(graph_node, &_graph->node_list, next) { + if (!(graph_node->node->flags & RTE_NODE_SOURCE_F)) + continue; + if (graph_node->node->lcore_id != RTE_MAX_LCORE && + graph_node->node->lcore_id != _graph->lcore_id) + continue; + name = graph_node->node->name; + node = graph_node_name_to_ptr(graph, name); + if (node == NULL) + continue; + __rte_node_pending_set(graph->src_pending, node); + } +} + static int graph_fp_mem_populate(struct graph *graph) { int rc; - graph_header_popluate(graph); + graph_header_populate(graph); if (graph_pcap_is_enable()) graph_pcap_init(graph); graph_nodes_populate(graph); rc = graph_node_nexts_populate(graph); - rc |= graph_src_nodes_offset_populate(graph); + rc |= graph_src_bitmap_populate(graph); return rc; } diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h index 26cdc6637192..df6f83b20261 100644 --- a/lib/graph/graph_private.h +++ b/lib/graph/graph_private.h @@ -49,6 +49,7 @@ struct node { STAILQ_ENTRY(node) next; /**< Next node in the list. */ char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */ uint64_t flags; /**< Node configuration flag. */ + int16_t priority; /**< Scheduling priority. */ unsigned int lcore_id; /**< Node runs on the Lcore ID used for mcore dispatch model. */ rte_node_process_t process; /**< Node process function. */ @@ -98,19 +99,23 @@ struct graph { const struct rte_memzone *mz; /**< Memzone to store graph data. */ rte_graph_off_t nodes_start; - /**< Node memory start offset in graph reel. */ + /**< Node memory start offset in graph memory. */ rte_graph_off_t xstats_start; - /**< Node xstats memory start offset in graph reel. */ + /**< Node xstats memory start offset in graph memory. */ rte_node_t src_node_count; /**< Number of source nodes in a graph. */ struct rte_graph *graph; /**< Pointer to graph data. */ rte_node_t node_count; /**< Total number of nodes. */ - uint32_t cir_start; - /**< Circular buffer start offset in graph reel. */ - uint32_t cir_mask; - /**< Circular buffer mask for wrap around. */ + uint32_t sched_table_off; + /**< Schedule table start offset in graph memory. */ + uint32_t pending_off; + /**< Pending bitmap start offset in graph memory. */ + uint32_t src_pending_off; + /**< Source pending bitmap start offset in graph memory. */ + uint16_t nb_sched_words; + /**< Number of uint64_t words in pending bitmaps. */ rte_graph_t id; /**< Graph identifier. */ rte_graph_t parent_id; @@ -378,6 +383,16 @@ int graph_fp_mem_create(struct graph *graph); */ int graph_fp_mem_destroy(struct graph *graph); +/** + * @internal + * + * Rebuild the source pending bitmap based on lcore affinity. + * + * @param graph + * Pointer to the internal graph object. + */ +void graph_src_bitmap_rebuild(struct graph *graph); + /* Lookup functions */ /** * @internal diff --git a/lib/graph/node.c b/lib/graph/node.c index e3359fe490a5..b5599143b37b 100644 --- a/lib/graph/node.c +++ b/lib/graph/node.c @@ -153,6 +153,7 @@ __rte_node_register(const struct rte_node_register *reg) if (rte_strscpy(node->name, reg->name, RTE_NODE_NAMESIZE) < 0) goto free_xstat; node->flags = reg->flags; + node->priority = reg->priority; node->process = reg->process; node->init = reg->init; node->fini = reg->fini; @@ -216,6 +217,7 @@ node_clone(struct node *node, const char *name) /* Clone the source node */ reg->flags = node->flags; + reg->priority = node->priority; reg->process = node->process; reg->init = node->init; reg->fini = node->fini; diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h index 7e433f466129..6cd32ec22284 100644 --- a/lib/graph/rte_graph.h +++ b/lib/graph/rte_graph.h @@ -496,6 +496,7 @@ struct rte_node_register { char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */ uint64_t flags; /**< Node configuration flag. */ #define RTE_NODE_SOURCE_F (1ULL << 0) /**< Node type is source. */ + int16_t priority; /**< Scheduling priority (lower = visited first, default 0). */ This will break the ABI. Please run ABI check and see/fix. rte_node_process_t process; /**< Node process function. */ rte_node_init_t init; /**< Node init function. */ rte_node_fini_t fini; /**< Node fini function. */ diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h index f9ff3daa88ec..50a473564b56 100644 --- a/lib/graph/rte_graph_model_mcore_dispatch.h +++ b/lib/graph/rte_graph_model_mcore_dispatch.h @@ -77,9 +77,13 @@ int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, unsigned int lcore_id); /** - * Perform graph walk on the circular buffer and invoke the process function + * Perform graph walk on the pending bitmap and invoke the process function * of the nodes and collect the stats. * + * Nodes are visited in scheduling order (lowest priority value first). + * Source nodes are seeded into the pending bitmap at the start of each walk. + * Nodes with different lcore affinity are dispatched to their target lcore. + * * @param graph * Graph pointer returned from rte_graph_lookup function. * @@ -88,20 +92,28 @@ int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name, static inline void rte_graph_walk_mcore_dispatch(struct rte_graph *graph) { - const rte_graph_off_t *cir_start = graph->cir_start; - const rte_node_t mask = graph->cir_mask; - uint32_t head = graph->head; + const uint16_t nwords = graph->nb_sched_words; struct rte_node *node; + uint16_t word, bit; if (graph->dispatch.wq != NULL) __rte_graph_mcore_dispatch_sched_wq_process(graph); - while (likely(head != graph->tail)) { - node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]); + /* Seed pending bitmap with source nodes bound to this lcore */ + for (word = 0; word < nwords; word++) + graph->pending[word] |= graph->src_pending[word]; - /* skip the src nodes which not bind with current worker */ - if ((int32_t)head < 1 && node->dispatch.lcore_id != graph->dispatch.lcore_id) - continue; + for (;;) { + /* find first word with any pending bit */ + for (word = 0; word < nwords; word++) + if (graph->pending[word]) + break; + if (word == nwords) + break; /* no more pending nodes */ + + bit = rte_ctz64(graph->pending[word]); + graph->pending[word] &= ~(1ULL << bit); + node = __rte_graph_pending_node(graph, word, bit); /* Schedule the node until all task/objs are done */ if (node->dispatch.lcore_id != RTE_MAX_LCORE && @@ -111,11 +123,7 @@ rte_graph_walk_mcore_dispatch(struct rte_graph *graph) continue; __rte_node_process(graph, node); - - head = likely((int32_t)head > 0) ? head & mask : head; } - - graph->tail = 0; } #ifdef __cplusplus diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h index 4b6236e301e3..38feb3e1ca88 100644 --- a/lib/graph/rte_graph_model_rtc.h +++ b/lib/graph/rte_graph_model_rtc.h @@ -6,9 +6,12 @@ #include "rte_graph_worker_common.h" /** - * Perform graph walk on the circular buffer and invoke the process function + * Perform graph walk on the pending bitmap and invoke the process function * of the nodes and collect the stats. * + * Nodes are visited in scheduling order (lowest priority value first). + * Source nodes are seeded into the pending bitmap at the start of each walk. + * * @param graph * Graph pointer returned from rte_graph_lookup function. * @@ -17,30 +20,52 @@ static inline void rte_graph_walk_rtc(struct rte_graph *graph) { - const rte_graph_off_t *cir_start = graph->cir_start; - const rte_node_t mask = graph->cir_mask; - uint32_t head = graph->head; + const uint16_t nwords = graph->nb_sched_words; struct rte_node *node; + uint16_t word, bit; /* - * Walk on the source node(s) ((cir_start - head) -> cir_start) and then - * on the pending streams (cir_start -> (cir_start + mask) -> cir_start) - * in a circular buffer fashion. + * Nodes are assigned a bit position (sched_idx) sorted by (priority, + * node_id) at graph creation time. Source nodes are forced to INT16_MIN + * priority so they always come first. * - * +-----+ <= cir_start - head [number of source nodes] - * | | - * | ... | <= source nodes - * | | - * +-----+ <= cir_start [head = 0] [tail = 0] - * | | - * | ... | <= pending streams - * | | - * +-----+ <= cir_start + mask + * sched_table[] maps bit positions to node offsets: + * + * pending[] sched_table[] + * +----------+ +------------------+ + * | word 0 | ---> | src_node_0 | bit 0 (prio=INT16_MIN) + * | 1100...1 | | src_node_1 | bit 1 (prio=INT16_MIN) + * | | | mpls_input | bit 2 (prio=-10) + * | | | ipv4_input | bit 3 (prio=0) + * | | | ... | + * +----------+ +------------------+ + * | word 1 | ---> | ip4_rewrite | bit 64 (prio=10) + * | ... | | ... | + * +----------+ +------------------+ + * + * Walk: for each word, find lowest set bit (rte_ctz64), process that + * node, clear the bit, re-read the word (processing may have set new + * bits), repeat. + * + * After each node is processed, restart scanning from word 0 since + * processing may set bits in any word, including earlier ones. */ - while (likely(head != graph->tail)) { - node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]); + + /* Seed pending bitmap with source nodes */ + for (word = 0; word < nwords; word++) + graph->pending[word] |= graph->src_pending[word]; + + for (;;) { + /* find first word with any pending bit */ + for (word = 0; word < nwords; word++) + if (graph->pending[word]) + break; + if (word == nwords) + break; /* no more pending nodes */ + + bit = rte_ctz64(graph->pending[word]); + graph->pending[word] &= ~(1ULL << bit); + node = __rte_graph_pending_node(graph, word, bit); __rte_node_process(graph, node); - head = likely((int32_t)head > 0) ? head & mask : head; } - graph->tail = 0; } diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h index b0f952a82cc9..e513d7a655d9 100644 --- a/lib/graph/rte_graph_worker.h +++ b/lib/graph/rte_graph_worker.h @@ -14,7 +14,7 @@ extern "C" { #endif /** - * Perform graph walk on the circular buffer and invoke the process function + * Perform graph walk on the pending bitmap and invoke the process function * of the nodes and collect the stats. * * @param graph diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h index 4ab53a533e4c..e52a37ce5e84 100644 --- a/lib/graph/rte_graph_worker_common.h +++ b/lib/graph/rte_graph_worker_common.h @@ -49,15 +49,14 @@ SLIST_HEAD(rte_graph_rq_head, rte_graph); */ struct __rte_cache_aligned rte_graph { /* Fast path area. */ - uint32_t tail; /**< Tail of circular buffer. */ - uint32_t head; /**< Head of circular buffer. */ - uint32_t cir_mask; /**< Circular buffer wrap around mask. */ rte_node_t nb_nodes; /**< Number of nodes in the graph. */ - rte_graph_off_t *cir_start; /**< Pointer to circular buffer. */ rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */ + rte_graph_off_t *sched_table; /**< Node offset indexed by sched_idx. */ + uint64_t *pending; /**< Bitmap of pending nodes. */ + uint64_t *src_pending; /**< Bitmap of source nodes (constant). */ + uint16_t nb_sched_words; /**< Number of uint64_t words in pending bitmaps. */ uint8_t model; /**< graph model */ - uint8_t reserved1; /**< Reserved for future use. */ - uint16_t reserved2; /**< Reserved for future use. */ + /* 26 bytes padding */ union { /* Fast schedule area for mcore dispatch model */ struct { @@ -98,6 +97,7 @@ struct __rte_cache_aligned rte_node { rte_node_t id; /**< Node identifier. */ rte_node_t parent_id; /**< Parent Node identifier. */ rte_edge_t nb_edges; /**< Number of edges from this node. */ + uint16_t sched_idx; /**< Bit position in pending bitmap. */ uint32_t realloc_count; /**< Number of times realloced. */ char parent[RTE_NODE_NAMESIZE]; /**< Parent node name. */ @@ -132,7 +132,7 @@ struct __rte_cache_aligned rte_node { }; /**< Node Context. */ uint16_t size; /**< Total number of objects available. */ uint16_t idx; /**< Number of objects used. */ - rte_graph_off_t off; /**< Offset of node in the graph reel. */ + rte_graph_off_t off; /**< Offset of node in the graph memory. */ uint64_t total_cycles; /**< Cycles spent in this node. */ uint64_t total_calls; /**< Calls done to this node. */ uint64_t total_objs; /**< Objects processed by this node. */ @@ -187,12 +187,12 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph, /** * @internal * - * Enqueue a given node to the tail of the graph reel. + * Process a node's pending objects and collect stats. * * @param graph * Pointer Graph object. * @param node - * Pointer to node object to be enqueued. + * Pointer to node object to be processed. */ static __rte_always_inline void __rte_node_process(struct rte_graph *graph, struct rte_node *node) @@ -220,21 +220,42 @@ __rte_node_process(struct rte_graph *graph, struct rte_node *node) /** * @internal * - * Enqueue a given node to the tail of the graph reel. + * Get a pointer to a node from the scheduling table. * * @param graph * Pointer Graph object. + * @param word + * Offset in the pending bitmap. + * @param bit + * Bit number. + * + * @return + * Pointer to the node. + */ +static __rte_always_inline struct rte_node * +__rte_graph_pending_node(struct rte_graph *graph, uint16_t word, uint16_t bit) +{ + const uint16_t index = (word * sizeof(*graph->pending) * CHAR_BIT) + bit; + const rte_graph_off_t node_offset = graph->sched_table[index]; + return RTE_PTR_ADD(graph, node_offset); +} + +/** + * @internal + * + * Mark a node as pending in the graph scheduling bitmap. + * + * @param bitmap + * Either graph->pending or graph->src_pending. * @param node - * Pointer to node object to be enqueued. + * Pointer to node object to be marked pending. */ static __rte_always_inline void -__rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node) +__rte_node_pending_set(uint64_t *bitmap, struct rte_node *node) { - uint32_t tail; - - tail = graph->tail; - graph->cir_start[tail++] = node->off; - graph->tail = tail & graph->cir_mask; + const uint16_t word = node->sched_idx / (sizeof(*bitmap) * CHAR_BIT); + const uint16_t bit = node->sched_idx % (sizeof(*bitmap) * CHAR_BIT); + bitmap[word] |= 1ULL << bit; } /** @@ -242,8 +263,8 @@ __rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node) * * Enqueue sequence prologue function. * - * Updates the node to tail of graph reel and resizes the number of objects - * available in the stream as needed. + * Marks the node as pending in the scheduling bitmap and resizes the number + * of objects available in the stream as needed. * * @param graph * Pointer to the graph object. @@ -259,9 +280,7 @@ __rte_node_enqueue_prologue(struct rte_graph *graph, struct rte_node *node, const uint16_t idx, const uint16_t space) { - /* Add to the pending stream list if the node is new */ - if (idx == 0) - __rte_node_enqueue_tail_update(graph, node); + __rte_node_pending_set(graph->pending, node); if (unlikely(node->size < (idx + space))) __rte_node_stream_alloc_size(graph, node, node->size + space); @@ -293,7 +312,7 @@ __rte_node_next_node_get(struct rte_node *node, rte_edge_t next) /** * Enqueue the objs to next node for further processing and set - * the next node to pending state in the circular buffer. + * the next node to pending state in the scheduling bitmap. * * @param graph * Graph pointer returned from rte_graph_lookup(). @@ -321,7 +340,7 @@ rte_node_enqueue(struct rte_graph *graph, struct rte_node *node, /** * Enqueue only one obj to next node for further processing and - * set the next node to pending state in the circular buffer. + * set the next node to pending state in the scheduling bitmap. * * @param graph * Graph pointer returned from rte_graph_lookup(). @@ -347,7 +366,7 @@ rte_node_enqueue_x1(struct rte_graph *graph, struct rte_node *node, /** * Enqueue only two objs to next node for further processing and - * set the next node to pending state in the circular buffer. + * set the next node to pending state in the scheduling bitmap. * Same as rte_node_enqueue_x1 but enqueue two objs. * * @param graph @@ -377,7 +396,7 @@ rte_node_enqueue_x2(struct rte_graph *graph, struct rte_node *node, /** * Enqueue only four objs to next node for further processing and - * set the next node to pending state in the circular buffer. + * set the next node to pending state in the scheduling bitmap. * Same as rte_node_enqueue_x1 but enqueue four objs. * * @param graph @@ -414,7 +433,7 @@ rte_node_enqueue_x4(struct rte_graph *graph, struct rte_node *node, /** * Enqueue objs to multiple next nodes for further processing and - * set the next nodes to pending state in the circular buffer. + * set the next nodes to pending state in the scheduling bitmap. * objs[i] will be enqueued to nexts[i]. * * @param graph @@ -472,7 +491,7 @@ rte_node_next_stream_get(struct rte_graph *graph, struct rte_node *node, } /** - * Put the next stream to pending state in the circular buffer + * Put the next stream to pending state in the scheduling bitmap * for further processing. Should be invoked after rte_node_next_stream_get(). * * @param graph @@ -495,9 +514,7 @@ rte_node_next_stream_put(struct rte_graph *graph, struct rte_node *node, return; node = __rte_node_next_node_get(node, next); - if (node->idx == 0) - __rte_node_enqueue_tail_update(graph, node); - + __rte_node_pending_set(graph->pending, node); node->idx += idx; } @@ -530,7 +547,7 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src, src->objs = dobjs; src->size = dsz; dst->idx = src->idx; - __rte_node_enqueue_tail_update(graph, dst); + __rte_node_pending_set(graph->pending, dst); } else { /* Move the objects from src node to dst node */ rte_node_enqueue(graph, src, next, src->objs, src->idx); } -- 2.54.0

