Thanks for the review and questions.
In our deployment scenario, long-running services must avoid runtime
allocation/deallocation to ensure stability. We have observed memory
fragmentation in practice when frequent small allocations happen on the
data path. Even with optimized allocators, this behavior accumulates over
time and can lead to unexpected latency spikes.
To address this, our project adopts a pre-allocation model: each acl_ctx
is associated with a sufficiently large memory block during initialization,
and no allocations occur afterwards. This approach has been effective in
eliminating runtime uncertainty in our use case.
The proposed patch enables applications with similar requirements to plug
their memory management strategy into the ACL layer without changing the
core logic. The default behavior remains unchanged.
On 11/25/2025 10:59 PM, Stephen Hemminger wrote:
On Tue, 25 Nov 2025 12:14:46 +0000
"mannywang(王永峰)" <[email protected]> wrote:
Reduce memory fragmentation caused by dynamic memory allocations
by allowing users to provide custom memory allocator.
Add new members to struct rte_acl_config to allow passing custom
allocator callbacks to rte_acl_build:
- running_alloc: allocator callback for run-time internal memory
- running_free: free callback for run-time internal memory
- running_ctx: user-defined context passed to running_alloc/free
- temp_alloc: allocator callback for temporary memory during ACL build
- temp_reset: reset callback for temporary allocator
- temp_ctx: user-defined context passed to temp_alloc/reset
These callbacks allow users to provide their own memory pools or
allocators for both persistent runtime structures and temporary
build-time data.
A typical approach is to pre-allocate a static memory region
for rte_acl_ctx, and to provide a global temporary memory manager
that supports multipleallocations and a single reset during ACL build.
Since tb_mem_pool handles allocation failures using siglongjmp,
temp_alloc follows the same approach for failure handling.
Signed-off-by: YongFeng Wang <[email protected]>
Rather than introduce an API change which can have impacts in many places;
would it be better to fix the underlying rte_malloc implementation.
The allocator in rte_malloc() is simplistic compared to glibc and
other malloc libraries. The other libraries provide better density,
statistics and performance.
Improving rte_malloc() would help all use cases not just the special
case of busy ACL usage.
The other question is does ACL library really need to be storing
this data in huge pages at all? If all it needed was an allocator
for single process usage, than just using regular malloc would
avoid the whole mess.