> Modern compilers at -O2 make good inlining decisions for small
> static inline functions; forced inlining via __rte_always_inline
> should be reserved for cases where it is required for correctness
> or for documented measured performance reasons.
>
> Document the policy in the coding style guide and add a
> checkpatches.sh entry that flags when new uses of the attribute
> are introduced. Checkpatches is not an absolute blocker to
> acceptance, only an indicator that more review is needed.
>
> Add additional comments about use of __rte_always_inline,
> __rte_noinline, __rte_hot, and __rte_cold to the rte_common.h
> to aid developers.
>
> Signed-off-by: Stephen Hemminger <[email protected]>
> ---
> devtools/checkpatches.sh | 8 +++++
> doc/guides/contributing/coding_style.rst | 26 ++++++++++++++++-
> lib/eal/include/rte_common.h | 37 ++++++++++++++++++++++--
> 3 files changed, 68 insertions(+), 3 deletions(-)
>
> diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
> index f5dd77443f..2a3d364178 100755
> --- a/devtools/checkpatches.sh
> +++ b/devtools/checkpatches.sh
> @@ -137,6 +137,14 @@ check_forbidden_additions() { # <patch>
> -f $(dirname $(readlink -f $0))/check-forbidden-tokens.awk \
> "$1" || res=1
>
> + # forbid new use of __rte_always_inline
> + awk -v FOLDERS="lib drivers app examples" \
> + -v EXPRESSIONS='\\<__rte_always_inline\\>' \
> + -v RET_ON_FAIL=1 \
> + -v MESSAGE='Adding __rte_always_inline; prefer plain inline' \
> + -f $(dirname $(readlink -f $0))/check-forbidden-tokens.awk \
> + "$1" || res=1
> +
> # refrain from using compiler __rte_atomic_thread_fence()
> # It should be avoided on x86 for SMP case.
> awk -v FOLDERS="lib drivers app examples" \
> diff --git a/doc/guides/contributing/coding_style.rst
> b/doc/guides/contributing/coding_style.rst
> index 243a3c2959..8f9e77d1c1 100644
> --- a/doc/guides/contributing/coding_style.rst
> +++ b/doc/guides/contributing/coding_style.rst
> @@ -747,11 +747,35 @@ Static Variables and Functions
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> * All functions and variables that are local to a file must be declared as
> ``static``
> because it can often help the compiler to do some optimizations (such as,
> inlining
> the code).
> -* Functions that should be inlined should to be declared as ``static
> inline`` and
> can be defined in a .c or a .h file.
> +* Functions that should be inlined should be declared as ``static inline``
> and can
> be defined in a .c or a .h file.
>
> .. note::
> Static functions defined in a header file must be declared as ``static
> inline`` in order to prevent compiler warnings about the function being
> unused.
>
> +Use of ``__rte_always_inline``
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The ``__rte_always_inline`` attribute forces the compiler to inline a
> function
> regardless of its size or call-graph heuristics.
> +Prefer plain ``inline`` (or no annotation at all for static functions) and
> let the
> compiler decide.
> +Modern compilers at ``-O2`` make good inlining decisions for small ``static
> inline`` functions in headers,
> +and forced inlining can hurt performance by inflating function bodies,
> increasing
> register pressure, and overriding profile-guided optimization.
> +
> +``__rte_always_inline`` should only be used when one of the following
> applies:
> +
> +* The function contains ``__rte_constant()`` checks that gate a
> constant-folded
> fast path,
> + and the optimization is lost if the function is not inlined into the
> caller.
> + Examples include byte-order helpers and length-dispatched copy/compare
> routines.
> +
> +* The function wraps inline assembly or a compiler intrinsic whose
> correctness
> depends on being inlined into the caller's register context (for example,
> intrinsics
> requiring a compile-time constant argument).
> +
> +* Measurement on a representative workload shows that the annotation is
> required to retain performance, and the reason is documented in the commit
> message that introduces it.
> +
> +Each use must be justified at the point it is introduced. Adding
> ``__rte_always_inline`` because nearby code uses it is not a justification;
> +if the constant or intrinsic that requires inlining is several call levels
> up the call
> chain,
> +restructure the code rather than annotating the entire chain.
> +
> +The complementary attribute ``__rte_noinline`` is useful for explicitly
> marking
> cold paths (error handling, initialization, slow-path fallbacks) where
> outlining the
> function can reduce instruction-cache pressure on the hot path.
> +
> Const Attribute
> ~~~~~~~~~~~~~~~
>
> diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
> index 71415346cf..e358be7fcf 100644
> --- a/lib/eal/include/rte_common.h
> +++ b/lib/eal/include/rte_common.h
> @@ -482,7 +482,22 @@ static void __attribute__((destructor(RTE_PRIO(prio)),
> used)) func(void)
> #endif
>
> /**
> - * Force a function to be inlined
> + * Force a function to be inlined, regardless of the compiler's size and
> + * call-graph heuristics.
> + *
> + * Prefer plain @c inline (or no annotation on a static function) and let the
> compiler decide.
> + * Modern compilers at -O2 inline small static functions well,
> + * and forcing it can hurt by inflating call sites, raising register
> pressure,
> + * and overriding profile-guided optimization.
> + *
> + * Reserve this attribute for cases where inlining is required for
> + * correctness, or for a documented and measured performance reason, e.g.
> + * - a constant-folded fast path gated by @c __rte_constant() that is lost
> + * unless the function is inlined into the caller;
> + * - a wrapper around inline asm or an intrinsic that needs a
> + * compile-time-constant argument from the caller's context.
> + *
> + * See the DPDK coding style guide for the full policy.
> */
> #ifdef RTE_TOOLCHAIN_MSVC
> #define __rte_always_inline __forceinline
> @@ -491,7 +506,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)),
> used)) func(void)
> #endif
>
> /**
> - * Force a function to be noinlined
> + * Force a function not to be inlined.
> + *
> + * Useful for explicitly outlining cold paths such as error handling,
> + * initialization, slow-path fallbacks, so they do not bloat the hot path
> + * or add to its instruction-cache footprint.
> */
> #ifdef RTE_TOOLCHAIN_MSVC
> #define __rte_noinline __declspec(noinline)
> @@ -501,6 +520,12 @@ static void __attribute__((destructor(RTE_PRIO(prio)),
> used)) func(void)
>
> /**
> * Hint function in the hot path
> + *
> + * The compiler may optimize the function more aggressively, treat calls
> + * to it as likely for branch prediction, and group it with other hot
> + * functions to improve instruction-cache locality. This affects code
> + * placement and prediction, not inlining; combine with an inline
> + * annotation if both are wanted.
> */
> #ifdef RTE_TOOLCHAIN_MSVC
> #define __rte_hot
> @@ -510,6 +535,14 @@ static void __attribute__((destructor(RTE_PRIO(prio)),
> used)) func(void)
>
> /**
> * Hint function in the cold path
> + *
> + * The compiler optimizes the function for size rather than speed,
> + * marks branches that reach it as unlikely, and may move it to a separate
> + * section to keep it off the hot path and reduce instruction-cache
> + * pressure there.
> + *
> + * Suitable for error handling, logging, and setup/teardown code.
> + * Functions marked @c __rte_noreturn are already treated as cold.
> */
> #ifdef RTE_TOOLCHAIN_MSVC
> #define __rte_cold
> --
Acked-by: Konstantin Ananyev <[email protected]>
> 2.53.0