Hi Pavan, I am resending my review here. It seems you didn't get it.
, Oct 14, 2024 at 13:58:
From: Pavan Nikhilesh <pbhagavat...@marvell.com> Introduce the ability for nodes to advertise error counters during registration and increment them during the node process function in the graph library. This enhancement allows for better error tracking and debugging capabilities within the graph framework. The number of errors and the mapping of error IDs to error descriptions are defined during node registration. If an error is encountered during the node process function while walking the graph, the respective error counter is incremented.
This feature could be useful to store detailed statistics per node, not only for errors. It would be better to rename "errors" to "xstats".
See below for a concrete suggestion.
Example: static struct rte_node_errors ip4_reassembly_errors = { .nb_errors = 1, .err_desc = { [0] = "ip4_reassembly_error", }, };
static const struct rte_node_xstats ip4_reassembly_xstats = { .xstats_num = 1, .xstats_desc = { [0] = "ip4_reassembly_error", }, }
Here, "ip4_reassembly_error" is mapped to error ID 0, and the same ID is used in the `ip4_reassembly_node_process` function to increment reassembly errors. Depending on the node, there can be multiple such errors that can be updated independently and retrieved using `rte_graph_cluster_stats_get`. Example: +-------------------------------+---------------+---------------+--------------+ |Node |calls |objs |realloc_count | +-------------------------------+---------------+---------------+--------------+ |ip4_lookup |1324083 |338965248 |2 | | ip4_lookup_error | |338965496 | | |pkt_drop |1324084 |338965504 |1 | |ethdev_rx-0-0 |1324086 |338966016 |2 | |pkt_cls |1324086 |338966016 |1 | +-------------------------------+---------------+---------------+--------------+ v2 Changes: - Fix compilation. v3 Changes: - Resend as 1/5 didn't make it through. v4 Changes: - Address review comments. - Rebase on main branch. v5 Changes: - Shrink structure member names.(Robin) - add rte_node_error_increment utility function. (Robin) - Squash patches. (Robin) - Update RN, DN. (David) Pavan Nikhilesh (3): graph: add support for node specific errors graph: add node error counters
I think patch 1/3 and 2/3 should be split differently. My preference would be to have documentation (especially large svg images) in a separate commit. Other than that, the changes in lib/graph should be squashed in the same patch.
node: add error stats for ip4 nodes doc/guides/prog_guide/graph_lib.rst | 22 +- .../prog_guide/img/anatomy_of_a_node.svg | 329 +++++-- .../prog_guide/img/graph_mem_layout.svg | 921 +++++++++++++----- doc/guides/rel_notes/deprecation.rst | 6 - doc/guides/rel_notes/release_24_11.rst | 8 + lib/graph/graph_populate.c | 20 +- lib/graph/graph_private.h | 3 + lib/graph/graph_stats.c | 78 +- lib/graph/node.c | 37 +- lib/graph/rte_graph.h | 11 + lib/graph/rte_graph_worker_common.h | 23 + lib/graph/version.map | 7 + lib/node/ip4_lookup.c | 9 + lib/node/ip4_lookup_neon.h | 5 + lib/node/ip4_lookup_sse.h | 6 + lib/node/ip4_reassembly.c | 9 + lib/node/node_private.h | 8 + 17 files changed, 1192 insertions(+), 310 deletions(-)
To summarize changes, here is my proposal: struct rte_node_xstats { uint8_t xstats_num; /**< Number of xstats. */ char (*xstats_desc)[RTE_NODE_XSTATS_DESC_SIZE]; /**< Names of xstats. */ }; struct rte_node_register { ... const struct rte_node_xstats *xstats; /**< Node specific extra statistics. */ ... }; static inline void rte_node_xstat_increment(struct rte_node *node, uint16_t stat_id, uint64_t value) { #ifdef RTE_LIBRTE_GRAPH_STATS uint64_t *errors = RTE_PTR_ADD(node, node->err_off); errors[err_id] += value; #else RTE_SET_USED(node); RTE_SET_USED(err_id); RTE_SET_USED(value); #endif } struct __rte_cache_aligned rte_graph_cluster_node_stats { ... uint8_t xstats_num; char (*xstats_desc)[RTE_NODE_XSTATS_DESC_SIZE]; uint64_t *xstats_val; ... }; Let me know what you think. Thanks!