> From: Mattias Rönnblom [mailto:[email protected]] > Sent: Wednesday, 27 May 2026 19.31 > > > This RFC introduces fastmem, a general-purpose small-object allocator > for DPDK. It is intended to replace per-type mempools with a single > allocator that handles arbitrary sizes, grows on demand, and matches > mempool-level performance on the hot path. > > Motivation > ---------- > > DPDK applications commonly maintain many mempools — one per object > type (connections, sessions, timers, work items). Each must be sized > up front, wastes memory when over-provisioned, and cannot serve > objects of a different size. Fastmem eliminates this by accepting > arbitrary sizes at runtime, backed by a slab allocator that > repurposes memory across size classes as demand shifts. > > Design > ------ > > Three-layer architecture: > > 1. Backing memory: 128 MiB IOVA-contiguous memzones from EAL, > reserved lazily (or pre-reserved for deterministic latency). > > 2. Slabs: 2 MiB, 2 MiB-aligned regions carved from memzones. > The alignment enables O(1) slab lookup from any object pointer > via bitmask — no radix tree or index structure. Slabs move > freely between 18 power-of-2 size classes (8 B to 1 MiB). > > 3. Per-lcore caches: bounded LIFO stacks (no locks on the hot > path). Cache misses trigger bulk transfers to/from the shared > bin under a spinlock. > > Key properties: > > - Zero per-object metadata in the production build. > - NUMA-aware, with per-socket bins and free-slab pools. > - DMA-usable memory with O(1) virt-to-IOVA translation. > - Bulk alloc/free with all-or-nothing semantics. > - Backing memory never returned during lifetime (slabs recycled). > - Non-EAL threads supported (bypass cache, take bin lock). > - Secondary process support (lazy attach, no per-lcore caches). > > API surface > ----------- > > rte_fastmem_init / deinit > rte_fastmem_reserve > rte_fastmem_set_limit / get_limit > rte_fastmem_alloc / alloc_socket > rte_fastmem_realloc > rte_fastmem_alloc_bulk / alloc_bulk_socket > rte_fastmem_free / free_bulk > rte_fastmem_hlookup / halloc / halloc_bulk / hfree / hfree_bulk > rte_fastmem_virt2iova > rte_fastmem_cache_flush > rte_fastmem_max_size / classes > rte_fastmem_stats / stats_class / stats_lcore / stats_lcore_class > rte_fastmem_stats_reset > > All APIs are marked __rte_experimental. > > Performance > ----------- > > The single-object hot path is roughly 2–3× the cost of mempool > and an order of magnitude faster than rte_malloc. Under > multi-lcore contention, fastmem scales similarly to mempool, > while rte_malloc collapses. > > Limitations > ----------- > > - Maximum allocation: 1 MiB. Larger requests should use rte_malloc. > - Power-of-2 classes only; worst-case internal fragmentation ~50%. > - Backing memory not reclaimable short of deinit. > > Future work > ----------- > > - Lcore-affine allocations (false-sharing-free by construction). > - Mempool ops driver for transparent drop-in use.
Regarding mempool support. As you already mentioned, some mempools hold fully or partially initialized objects. Releasing such an object to the heap would require an ability to reconstruct it on allocation from the heap. In some cases, object reconstruction might be possible through callbacks or some other means. And in some cases, object reconstruction might be practically impossible. Under all circumstances, object reconstruction has a performance cost, which needs to be weighed up against the memory savings by freeing the objects back to the heap. This consideration is specific to each mempool, the kind of objects it holds, and how the mempool is being used. If we look specifically at the mbuf mempool, an mbuf comprises of metadata (struct rte_mbuf and possibly struct rte_mbuf_ext_shared_info) and the packet buffer itself. The mbuf structure supports using external buffers for the packet buffer, which does not need reconstruction if dynamically allocated from the heap. It seems viable to keep the metadata parts of the mbufs in a mempool, and dynamically allocate/free their packet buffers on mbuf allocation/free. A shim mempool ops driver could relatively easily implement this. It might require a few additions to the mbuf and/or mempool libraries too, but that would be acceptable. <feature creep> Another thing regarding mbuf packet data: Some NICs require packet buffers of 2048 bytes, but we also allocate a headroom of default 128 bytes in front of it, so the default packet buffer size (RTE_MBUF_DEFAULT_BUF_SIZE) is not 2^N, but 2048+128=2176 [1]. [1]: https://elixir.bootlin.com/dpdk/v26.03/source/lib/mbuf/rte_mbuf_core.h#L408 Allocating fastmem buffers of 4 KiB and only use 2.1 KiB seems wasteful. Could the fastmem library support a shortlist of magic object sizes that are not 2^N? The magic sizes should be explicitly configured at run-time. (The mbuf library must inform the fastmem library of the requested data_room_size before it populates the mbuf mempool.) The shortlist should have a fixed max length, maybe 4 as default, preferably build-time configurable. Removing a magic size from the shortlist need not be supported. Only adding magic sizes is required. The magic sizes will be relatively large (assume 512 bytes or more), so adding a fastlib object metadata structure to each magic-sized object is acceptable, if necessary. From a fastmem library perspective, WDYT? </feature creep> > - Debug mode (cookies, double-free detection, poison-on-free). > - Telemetry integration. > - EAL integration, allowing EAL-internal subsystems to use > fastmem for their small-object allocations. > > Changes in RFC v3: > - Add rte_fastmem_realloc() with full test coverage. > - Add __rte_malloc/__rte_dealloc compiler attributes; remove > incorrect __rte_alloc_size/__rte_alloc_align. > - Extract normalize_align() helper; remove redundant inline > directives. > - Merge lifecycle and functional test suites. > - Add realloc subsection to programming guide. > > Changes in RFC v2: > - Fix cross-socket deinit use-after-free. > - Add secondary process support. > - Add handle-based allocation API. > - Fix clang warnings; misc cleanup. > > > Mattias Rönnblom (3): > doc: add fastmem programming guide > lib: add fastmem library > app/test: add fastmem test suite > > app/test/meson.build | 3 + > app/test/test_fastmem.c | 1801 +++++++++++++++++++++++++ > app/test/test_fastmem_perf.c | 1040 ++++++++++++++ > app/test/test_fastmem_profile.c | 157 +++ > doc/api/doxy-api-index.md | 1 + > doc/api/doxy-api.conf.in | 1 + > doc/guides/prog_guide/fastmem_lib.rst | 328 +++++ > doc/guides/prog_guide/index.rst | 1 + > lib/fastmem/meson.build | 6 + > lib/fastmem/rte_fastmem.c | 1748 ++++++++++++++++++++++++ > lib/fastmem/rte_fastmem.h | 815 +++++++++++ > lib/meson.build | 1 + > 12 files changed, 5902 insertions(+) > create mode 100644 app/test/test_fastmem.c > create mode 100644 app/test/test_fastmem_perf.c > create mode 100644 app/test/test_fastmem_profile.c > create mode 100644 doc/guides/prog_guide/fastmem_lib.rst > create mode 100644 lib/fastmem/meson.build > create mode 100644 lib/fastmem/rte_fastmem.c > create mode 100644 lib/fastmem/rte_fastmem.h > > -- > 2.43.0

