AMD General Sharing v6 with fixes and cache-id fetch shortly.
> -----Original Message----- > From: Stephen Hemminger <[email protected]> > Sent: Wednesday, April 15, 2026 1:52 AM > To: Varghese, Vipin <[email protected]> > Cc: [email protected]; Tummala, Sivaprasad <[email protected]>; > [email protected]; [email protected]; > [email protected]; [email protected]; [email protected] > Subject: Re: [PATCH v5 0/3] eal/topology: introduce topology-aware lcore > grouping > > Caution: This message originated from an External Source. Use proper caution > when opening attachments, clicking links, or responding. > > > On Wed, 15 Apr 2026 01:08:18 +0530 > Vipin Varghese <[email protected]> wrote: > > > This series introduces a topology library that groups DPDK lcores > > based on CPU cache hierarchy and NUMA topology. The goal is to provide > > a stable and explicit API that allows applications to select lcores > > with better locality and cache sharing characteristics. > > > > The series includes: > > - EAL support for topology discovery using hwloc and domain-based lcore > > grouping (L1/L2/L3/L4/NUMA) > > - Topology-aware test cases validating API behavior and edge conditions > > - Programmer’s guide describing the topology library and APIs > > > > The API is marked experimental and does not change existing lcore > > behavior unless explicitly used by the application. > > > > Changes in v5: > > - Addressed review comments from v4 > > - Fixed ARM cross-compilation issues > > - Cleaned up domain iteration and error handling > > - Updated tests to cover domain edge cases > > - Documentation refinements and API usage clarification > > > > Changes in v4: > > - Corrected domain selection semantics > > - Updated example usage > > - Fixed naming and typo issues > > > > Changes in v3: > > - Fixed macro naming (USE_NO_TOPOLOGY) > > - Minor cleanups based on early feedback > > > > Tested on: > > - AMD EPYC (Milan, Genoa, Siena, Turin, Turin-Dense, Sorano) > > - Intel Xeon (SPR-SP, GNR-SP) > > - ARM Ampere > > - NVIDIA Grace Superchip > > > > Dependencies: > > - hwloc-dev (tested with 2.10.0) > > > > Patch breakdown: > > 1/3 eal/topology: add topology grouping for lcores > > 2/3 app: add topology-aware test cases > > 3/3 doc: add topology library documentation > > > > Future Work: > > - integrate into examples > > -- hellowrld: ready > > -- pkt-distributor: in-progress > > -- l2fwd: ready > > -- l3fwd: to start > > -- eventdevpipeline: PoC ready > > - integrate topology test > > -- crypto: yet to start > > -- compression: yet to start > > -- dma: PoC ready > > - add new features for > > -- PQoS: yet to start > > -- Data Injection: PoC with BRDCM Thor-2 ready > > > > Tested OS: Linux only, need help with BSD and Windows > > > > Tested with and without hwloc-dev library for > > - Ampere, aarch64, Neoverse-N1, NUMA-2, 256 CPU threads > > - Grace superchip, aarch64, Neoverse-V2, NUMA-2, 144 CPU threads > > - Intel GNR-SP, 6767P, NUMA-2, 256 Threads > > - AMD EPYC Siena, 8534P, NUMA-1, 128 Threads > > - AMD EPYC Sorano, 8635P, NUMA-1, 168 Threads > > > > Signed-off-by: Vipin Varghese <[email protected]> `` > > > > Vipin Varghese (3): > > eal/topology: add Topology grouping for lcores > > app: add topology aware test case > > doc: add new section topology > > > > app/test/meson.build | 1 + > > app/test/test_ring_perf.c | 416 +++++++++++++- > > app/test/test_stack_perf.c | 409 ++++++++++++++ > > app/test/test_topology.c | 676 ++++++++++++++++++++++ > > config/meson.build | 18 + > > doc/api/doxy-api-index.md | 1 + > > doc/guides/prog_guide/index.rst | 3 +- > > doc/guides/prog_guide/topology_lib.rst | 155 +++++ > > lib/eal/common/eal_private.h | 74 +++ > > lib/eal/common/eal_topology.c | 747 +++++++++++++++++++++++++ > > lib/eal/common/meson.build | 1 + > > lib/eal/freebsd/eal.c | 10 +- > > lib/eal/include/meson.build | 1 + > > lib/eal/include/rte_topology.h | 255 +++++++++ > > lib/eal/linux/eal.c | 7 + > > lib/eal/meson.build | 4 + > > 16 files changed, 2773 insertions(+), 5 deletions(-) create mode > > 100644 app/test/test_topology.c create mode 100644 > > doc/guides/prog_guide/topology_lib.rst > > create mode 100644 lib/eal/common/eal_topology.c create mode 100644 > > lib/eal/include/rte_topology.h > > > > AI review reported lots of possible correctness problems: > > > # DPDK Patch Review - bundle-1843-topology-v5.mbox > > ## Overview > > This patch series introduces topology awareness to DPDK, allowing > applications to > select logical cores based on CPU cache and I/O topology. The series adds a > new > `rte_topology` library with APIs to query and enumerate logical cores within > topology > domains (L1/L2/L3/L4/NUMA). > > --- > > ## Correctness Bugs (HIGHEST PRIORITY - report at >=50% confidence) > > ### Patch 1/3: lib/eal/common/eal_topology.c > > #### **Error: Resource leak on `eal_topology_map_layer()` failure** > > In `rte_eal_topology_init()`, if any call to `eal_topology_map_layer()` fails > (returns < > 0), the code calls `rte_eal_topology_release()` which frees previously > allocated > layers. However, the `hwloc_topology_t` handle `topo_cnfg.topology` is **NOT** > destroyed before returning. This leaks the hwloc topology handle. > > **Location:** `lib/eal/common/eal_topology.c:685-712` > > ```c > for (int i = 0; i < 5; i++) { > *layers[i].count = hwloc_get_nbobjs_by_depth(topo_cnfg.topology, > layers[i].depth); > if (eal_topology_map_layer(topo_cnfg.topology, layers[i].depth, > layers[i].count, > layers[i].ptr, layers[i].total_cores, layers[i].name) < 0) { > rte_eal_topology_release(); /* frees layer memory */ > return -1; /* BUG: topo_cnfg.topology NOT destroyed here */ > } > } > > hwloc_topology_destroy(topo_cnfg.topology); /* only reached on success */ > topo_cnfg.topology = NULL; ``` > > **Fix:** Destroy the topology before returning on error: > > ```c > if (eal_topology_map_layer(...) < 0) { > hwloc_topology_destroy(topo_cnfg.topology); > topo_cnfg.topology = NULL; > rte_eal_topology_release(); > return -1; > } > ``` > > --- > > #### **Error: Potential use-after-free in `eal_topology_map_layer()` on > partial > allocation failure** > > In `eal_topology_map_layer()`, the code allocates `dm->cores` for each > domain. If a > later allocation in the same loop iteration fails (e.g., for domain `j+1`), > the function > returns -1 immediately **without freeing `dm->cores` already allocated for > earlier > domains**. The caller (`rte_eal_topology_init()`) then calls > `rte_eal_topology_release()`, which expects `layer_ptr[j]` to be non-NULL but > `layer_ptr[j]->cores` may be uninitialized or garbage if the allocation > failed before > that point. This can cause a use-after-free or double-free when > `rte_eal_topology_release()` calls `rte_free(d->map[i]->cores)`. > > **Location:** `lib/eal/common/eal_topology.c:537-557` > > ```c > for (uint16_t j = 0; j < *layer_cnt; j++) { > hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, depth, j); > int cpu_count = hwloc_bitmap_weight(obj->cpuset); > if (cpu_count == -1) > continue; > > struct core_domain_mapping *dm = > rte_zmalloc(NULL, sizeof(struct core_domain_mapping), 0); > if (!dm) > return -1; /* BUG: leaks layer_ptr array allocated by caller */ > > (*layer_ptr)[j] = dm; > CPU_ZERO(&dm->core_set); > dm->core_count = 0; > > dm->cores = rte_malloc(NULL, sizeof(uint16_t) * cpu_count, 0); > if (!dm->cores) > return -1; /* BUG: leaks dm (just allocated) and previous entries */ > } ``` > > **Fix:** On allocation failure, free all previously allocated entries before > returning: > > ```c > if (!dm) { > /* Free all previously allocated entries */ > for (uint16_t k = 0; k < j; k++) { > if ((*layer_ptr)[k]) { > rte_free((*layer_ptr)[k]->cores); > rte_free((*layer_ptr)[k]); > } > } > rte_free(*layer_ptr); > return -1; > } > > /* Same cleanup for dm->cores allocation failure */ if (!dm->cores) { > rte_free(dm); > for (uint16_t k = 0; k < j; k++) { > if ((*layer_ptr)[k]) { > rte_free((*layer_ptr)[k]->cores); > rte_free((*layer_ptr)[k]); > } > } > rte_free(*layer_ptr); > return -1; > } > ``` > > --- > > #### **Error: NULL pointer dereference in `eal_topology_map_layer()` when > `hwloc_get_obj_by_depth()` returns NULL** > > In the second loop of `eal_topology_map_layer()` (lines 560-620), the code > calls > `hwloc_get_obj_by_depth(topology, depth, j)` and checks `if (!obj || > hwloc_bitmap_iszero(obj->cpuset))` to skip NULL objects. However, in the > **inner > loop** (lines 579-618), the code calls `hwloc_get_obj_by_depth(topology, > depth, k)` > and assigns it to `obj_core` but does **NOT** check if `obj_core` is NULL > before > calling `hwloc_bitmap_weight(obj_core->cpuset)` on line 581. If > `hwloc_get_obj_by_depth()` returns NULL for domain `k`, this will dereference > a > NULL pointer. > > **Location:** `lib/eal/common/eal_topology.c:579-582` > > ```c > for (uint16_t k = 0; k < *layer_cnt; k++) { > hwloc_obj_t obj_core = hwloc_get_obj_by_depth(topology, depth, k); > int cpu_count_core = hwloc_bitmap_weight(obj_core->cpuset); /* NULL > deref if > obj_core == NULL */ > if (cpu_count_core == -1) > continue; > ``` > > **Fix:** Check `obj_core` before dereferencing: > > ```c > hwloc_obj_t obj_core = hwloc_get_obj_by_depth(topology, depth, k); if > (!obj_core) > continue; > int cpu_count_core = hwloc_bitmap_weight(obj_core->cpuset); > if (cpu_count_core == -1) > continue; > ``` > > --- > > #### **Error: Incorrect second argument to > `rte_topo_get_nth_lcore_from_domain()` in `get_same_l1_domains()` > (test_ring_perf.c and test_stack_perf.c)** > > In both `app/test/test_ring_perf.c:290` and `app/test/test_stack_perf.c:258`, > the > function `get_same_l1_domains()` calls: > > ```c > id2 = rte_topo_get_nth_lcore_from_domain(domain, 0, 0, > RTE_TOPO_DOMAIN_L1); ``` > > The second argument (`lcore_pos`) is `0`, which is the same as for `id1`. > This will > assign **the same lcore** to both `id1` and `id2`, causing the subsequent > check `if > (id1 == id2) return 3;` to always trigger. This is a logic error: the intent > is clearly to > get two **different** lcores from the same domain. > > **Location:** `app/test/test_ring_perf.c:287-290` and > `app/test/test_stack_perf.c:255-258` > > **Fix:** Use position `1` for the second lcore: > > ```c > id1 = rte_topo_get_nth_lcore_from_domain(domain, 0, 0, > RTE_TOPO_DOMAIN_L1); > id2 = rte_topo_get_nth_lcore_from_domain(domain, 1, 0, > RTE_TOPO_DOMAIN_L1); ``` > > --- > > #### **Error: Iteration condition in `test_main_lcore_in_domain()` uses wrong > domain type for lookup** > > In `app/test/test_topology.c:211`, the loop iterates over `domain_count` for > `domain_types[d]`, but the call to `rte_topo_is_main_lcore_in_domain()` uses > `RTE_TOPO_DOMAIN_NUMA` instead of `domain_types[d]`. This means the test > only checks the NUMA domain regardless of which domain type `d` selects > (L1/L2/L3/L4). > > **Location:** `app/test/test_topology.c:206-216` > > ```c > for (unsigned int d = 0; d < RTE_DIM(domain_types); d++) { > bool main_lcore_found = false; > unsigned int domain_count = rte_topo_get_domain_count(domain_types[d]); > for (unsigned int dmn_idx = 0; dmn_idx < domain_count; dmn_idx++) { > main_lcore_found = > rte_topo_is_main_lcore_in_domain(RTE_TOPO_DOMAIN_NUMA, /* BUG: should > be domain_types[d] */ > dmn_idx); > if (main_lcore_found) > break; > } > ``` > > **Fix:** > > ```c > main_lcore_found = rte_topo_is_main_lcore_in_domain(domain_types[d], dmn_idx); > ``` > > --- > > #### **Error: Infinite loop risk in `rte_topo_get_nth_lcore_from_domain()` > when `ptr- > >core_count` is 0** > > In `lib/eal/common/eal_topology.c:296-318`, the function enters a `while (1)` > loop > that increments `new_lcore_pos`. If `ptr->core_count` is 0 (which the code > checks > earlier but does not return immediately), the loop will wrap `new_lcore_pos` > back to > 0 indefinitely, never breaking. While the function returns `RTE_MAX_LCORE` if > `ptr- > >core_count == 0` before the loop, the logic flow is unclear and the loop > >body does > not have a clear termination condition if the core count is 0. > > **Location:** `lib/eal/common/eal_topology.c:283-318` > > **Fix:** Add a sanity check inside the loop to prevent infinite iteration: > > ```c > unsigned int iterations = 0; > while (1) { > if (iterations++ > ptr->core_count * 2) /* safety limit */ > return RTE_MAX_LCORE; > /* ... rest of loop ... */ > } > ``` > > However, the real issue is that the code already returns `RTE_MAX_LCORE` if > `ptr->core_count == 0` on line 287, so this is more of a defensive-programming > note. The function should be refactored for clarity. > > --- > > #### **Error: Missing NULL check after `get_domain_lcore_mapping()` in > `rte_topo_get_next_lcore()`** > > In `rte_topo_get_next_lcore()`, the code calls `get_domain_lcore_mapping(flag, > lcore_domain)` and checks if `ptr` is NULL on line 350. However, if `ptr` is > NULL, > the function returns `RTE_MAX_LCORE`. This is correct, but the subsequent > logic > on line 381 calls `rte_topo_is_main_lcore_in_domain(flag, lcore_domain)`, > which > internally may call `get_domain_lcore_mapping()` again. If that call also > returns > NULL (which it will if the domain is invalid), the function > `rte_topo_is_main_lcore_in_domain()` will return `false`, which is safe. > However, the > logic is fragile and should explicitly handle the NULL case to avoid relying > on > transitive safety. > > **Location:** `lib/eal/common/eal_topology.c:381` > > **Recommendation:** The code is technically safe but could be clearer. No > change > required, but consider restructuring for maintainability. > > --- > > ### Patch 2/3: app/test Topology Tests > > #### **Error: Macro `RTE_TOPO_FOREACH_WORKER_LCORE_IN_DOMAIN` > declares variable in macro expansion (shadowing risk)** > > In `lib/eal/include/rte_topology.h:243-248`, the macro > `RTE_TOPO_FOREACH_WORKER_LCORE_IN_DOMAIN` declares a local > variable `main_lcore` inside the macro expansion: > > ```c > #define RTE_TOPO_FOREACH_WORKER_LCORE_IN_DOMAIN(lcore, > domain_indx, flag) \ > lcore = rte_topo_get_nth_lcore_from_domain(domain, 0, 0, flag); \ > uint16_t main_lcore = rte_get_main_lcore(); \ > for (lcore = (lcore != main_lcore) ? \ > lcore : rte_topo_get_next_lcore(lcore, 1, 0, flag); \ > lcore < RTE_MAX_LCORE; \ > lcore = rte_topo_get_next_lcore(lcore, 1, 0, flag)) ``` > > This can cause a compiler error or shadowing if the caller already has a > variable > named `main_lcore` in scope. Additionally, the macro uses `domain` (line 244) > but > the parameter is `domain_indx`, which is a typo and will cause a compilation > error. > > **Location:** `lib/eal/include/rte_topology.h:243-248` > > **Fix:** Wrap in a `do { } while (0)` and use a uniquely-named variable, or > document > that the macro must not be used if `main_lcore` is already declared. Also fix > the > typo: > > ```c > #define RTE_TOPO_FOREACH_WORKER_LCORE_IN_DOMAIN(lcore, > domain_indx, flag) \ > lcore = rte_topo_get_nth_lcore_from_domain(domain_indx, 0, 0, flag); \ > uint16_t __topo_main_lcore = rte_get_main_lcore(); \ > for (lcore = (lcore != __topo_main_lcore) ? \ > lcore : rte_topo_get_next_lcore(lcore, 1, 0, flag); \ > lcore < RTE_MAX_LCORE; \ > lcore = rte_topo_get_next_lcore(lcore, 1, 0, flag)) ``` > > --- > > ### Summary of Correctness Bugs > > 1. **hwloc topology leak** on `eal_topology_map_layer()` failure 2. **Resource > leak** in `eal_topology_map_layer()` on partial allocation failure 3. **NULL > pointer > dereference** in `eal_topology_map_layer()` inner loop 4. **Logic error** in > `get_same_l1_domains()` (same lcore assigned to `id1` and `id2`) 5. **Wrong > domain type** in `test_main_lcore_in_domain()` (uses > `RTE_TOPO_DOMAIN_NUMA` instead of `domain_types[d]`) 6. **Macro typo** in > `RTE_TOPO_FOREACH_WORKER_LCORE_IN_DOMAIN` (uses `domain` > instead of `domain_indx`) 7. **Potential infinite loop** in > `rte_topo_get_nth_lcore_from_domain()` if `ptr->core_count == 0` (mitigated by > early

