> On Tue, 20 Jan 2026 15:40:21 +0000
> Konstantin Ananyev <[email protected]> wrote:
> 
> > > > From: Stephen Hemminger [mailto:[email protected]]
> > > > Sent: Tuesday, 20 January 2026 15.34
> > > >
> > > > On Tue, 20 Jan 2026 09:49:44 +0100
> > > > Morten Brørup <[email protected]> wrote:
> > > >
> > > > > > From: Stephen Hemminger [mailto:[email protected]]
> > > > > > Sent: Monday, 19 January 2026 23.48
> > > > > >
> > > > > > On Fri, 16 Jan 2026 10:32:52 +0100
> > > > > > Morten Brørup <[email protected]> wrote:
> > > > > >
> > > > > > > > From: Stephen Hemminger [mailto:[email protected]]
> > > > > > > > Sent: Friday, 16 January 2026 07.46
> > > > > > > >
> > > > > > > > When building with LTO (Link Time Optimization), GCC performs
> > > > > > > > aggressive cross-compilation-unit inlining. This causes the
> > > > > > compiler
> > > > > > > > to analyze all code paths in __rte_ring_do_dequeue_elems(),
> > > > > > including
> > > > > > > > the 16-byte element path (__rte_ring_dequeue_elems_128), even
> > > > when
> > > > > > > > the runtime element size is only 4 bytes.
> > > > > > > >
> > > > > > > > The static analyzer sees that the 16-byte path would copy
> > > > > > > > 32 elements * 16 bytes = 512 bytes into a 128-byte buffer
> > > > > > > > (uint32_t[32]),
> > > > > > > > triggering -Wstringop-overflow warnings.
> > > > >
> > > > > The element size is not an inline function parameter, but fetched
> > > > from the "esize" field in the rte_soring structure, so the compiler
> > > > cannot see that the element size is 4 bytes. And thus it needs to
> > > > consider all possible element sizes.
> > > > >
> > > > > > > >
> > > > > > > > The existing #pragma GCC diagnostic suppression in
> > > > > > rte_ring_elem_pvt.h
> > > > > > > > doesn't help because with LTO the warning context shifts to the
> > > > > > test
> > > > > > > > file where the inlined code is instantiated.
> > > > > > > >
> > > > > > > > Fix by sizing all buffers passed to soring acquire/dequeue
> > > > > > functions
> > > > > > > > for the worst-case element size (16 bytes = 4 *
> > > > sizeof(uint32_t)).
> > > > > > > > This satisfies the static analyzer without changing runtime
> > > > > > behavior.
> > > > > > >
> > > > > > > Using wildly oversized buffers doesn't seem like a recommendable
> > > > > > solution.
> > > > > > > If the ring library is ever updated to support cache size
> > > > elements
> > > > > > (64 byte), the buffers would have to be oversize by factor 16.
> > > > > >
> > > > > > The analysis (from AI) is that compiler is getting confused.
> > > > >
> > > > > That would be my analysis too.
> > > > >
> > > > > > Since there is no good
> > > > > > way other than turning of LTO for the test to tell the compiler
> > > > >
> > > > > There is another way to tell the compiler: __rte_assume()
> > > >
> > > > Tried that but it doesn't work because doesn't get propagated deep
> > > > enough to impact here.
> > >
> > > Does this fix generally imply that when using LTO, using an SORING with
> elements
> > > smaller than 16 bytes requires oversize buffers?
> > > That's not good. :-(
> > >
> > > The SORING is still experimental.
> > > Maybe the element size and metadata size need to be passed as parameters
> to
> > > the SORING functions, like the RING functions take element size as
> parameter
> > > (except the functions that are hardcoded for using pointers as element 
> > > size).
> >
> > Personally, I am not a big fan of such idea...
> > Wonder is that possible just to disable LTO for soring.o?
> > Another thought - if all the problems come from 128 bit version of
> enque/dequeue,
> > would using memcpy() instead  of specific functions help to mitigate the
> problem?
> >
> >
> 
> A much simpler and clear solution is to just get rid of __rte_always_inline
> and use inline instead. The compiler still inlines a lot but it can make its
> own decision.
> The attribute always_inline is not always faster, in fact in real world
> applications it can make things slower because real applications get i-cache
> misses and lots of inline expansion makes it worse.

Sounds like a clean and safe fix.
I also don't expect any perf degradations with such approach, 
but will run some perf tests with it to confirm.
Thanks

Reply via email to