> Modern compilers at -O2 make good inlining decisions for small
> static inline functions; forced inlining via __rte_always_inline
> should be reserved for cases where it is required for correctness
> or for documented measured performance reasons.
> 
> Document the policy in the coding style guide and add a
> checkpatches.sh entry that flags when new uses of the attribute
> are introduced. Checkpatches is not an absolute blocker to
> acceptance, only an indicator that more review is needed.
> 
> Add additional comments about use of __rte_always_inline,
> __rte_noinline, __rte_hot, and __rte_cold to the rte_common.h
> to aid developers.
> 
> Signed-off-by: Stephen Hemminger <[email protected]>
> ---
>  devtools/checkpatches.sh                 |  8 +++++
>  doc/guides/contributing/coding_style.rst | 26 ++++++++++++++++-
>  lib/eal/include/rte_common.h             | 37 ++++++++++++++++++++++--
>  3 files changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
> index f5dd77443f..2a3d364178 100755
> --- a/devtools/checkpatches.sh
> +++ b/devtools/checkpatches.sh
> @@ -137,6 +137,14 @@ check_forbidden_additions() { # <patch>
>               -f $(dirname $(readlink -f $0))/check-forbidden-tokens.awk \
>               "$1" || res=1
> 
> +     # forbid new use of __rte_always_inline
> +     awk -v FOLDERS="lib drivers app examples" \
> +             -v EXPRESSIONS='\\<__rte_always_inline\\>' \
> +             -v RET_ON_FAIL=1 \
> +             -v MESSAGE='Adding __rte_always_inline; prefer plain inline' \
> +             -f $(dirname $(readlink -f $0))/check-forbidden-tokens.awk \
> +             "$1" || res=1
> +
>       # refrain from using compiler __rte_atomic_thread_fence()
>       # It should be avoided on x86 for SMP case.
>       awk -v FOLDERS="lib drivers app examples" \
> diff --git a/doc/guides/contributing/coding_style.rst
> b/doc/guides/contributing/coding_style.rst
> index 243a3c2959..8f9e77d1c1 100644
> --- a/doc/guides/contributing/coding_style.rst
> +++ b/doc/guides/contributing/coding_style.rst
> @@ -747,11 +747,35 @@ Static Variables and Functions
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>  * All functions and variables that are local to a file must be declared as 
> ``static``
> because it can often help the compiler to do some optimizations (such as, 
> inlining
> the code).
> -* Functions that should be inlined should to be declared as ``static 
> inline`` and
> can be defined in a .c or a .h file.
> +* Functions that should be inlined should be declared as ``static inline`` 
> and can
> be defined in a .c or a .h file.
> 
>  .. note::
>       Static functions defined in a header file must be declared as ``static
> inline`` in order to prevent compiler warnings about the function being 
> unused.
> 
> +Use of ``__rte_always_inline``
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The ``__rte_always_inline`` attribute forces the compiler to inline a 
> function
> regardless of its size or call-graph heuristics.
> +Prefer plain ``inline`` (or no annotation at all for static functions) and 
> let the
> compiler decide.
> +Modern compilers at ``-O2`` make good inlining decisions for small ``static
> inline`` functions in headers,
> +and forced inlining can hurt performance by inflating function bodies, 
> increasing
> register pressure, and overriding profile-guided optimization.
> +
> +``__rte_always_inline`` should only be used when one of the following 
> applies:
> +
> +* The function contains ``__rte_constant()`` checks that gate a 
> constant-folded
> fast path,
> +  and the optimization is lost if the function is not inlined into the 
> caller.
> +  Examples include byte-order helpers and length-dispatched copy/compare
> routines.
> +
> +* The function wraps inline assembly or a compiler intrinsic whose 
> correctness
> depends on being inlined into the caller's register context (for example, 
> intrinsics
> requiring a compile-time constant argument).
> +
> +* Measurement on a representative workload shows that the annotation is
> required to retain performance, and the reason is documented in the commit
> message that introduces it.
> +
> +Each use must be justified at the point it is introduced. Adding
> ``__rte_always_inline`` because nearby code uses it is not a justification;
> +if the constant or intrinsic that requires inlining is several call levels 
> up the call
> chain,
> +restructure the code rather than annotating the entire chain.
> +
> +The complementary attribute ``__rte_noinline`` is useful for explicitly 
> marking
> cold paths (error handling, initialization, slow-path fallbacks) where 
> outlining the
> function can reduce instruction-cache pressure on the hot path.
> +
>  Const Attribute
>  ~~~~~~~~~~~~~~~
> 
> diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
> index 71415346cf..e358be7fcf 100644
> --- a/lib/eal/include/rte_common.h
> +++ b/lib/eal/include/rte_common.h
> @@ -482,7 +482,22 @@ static void __attribute__((destructor(RTE_PRIO(prio)),
> used)) func(void)
>  #endif
> 
>  /**
> - * Force a function to be inlined
> + * Force a function to be inlined, regardless of the compiler's size and
> + * call-graph heuristics.
> + *
> + * Prefer plain @c inline (or no annotation on a static function) and let the
> compiler decide.
> + * Modern compilers at -O2 inline small static functions well,
> + * and forcing it can hurt by inflating call sites, raising register 
> pressure,
> + * and overriding profile-guided optimization.
> + *
> + * Reserve this attribute for cases where inlining is required for
> + * correctness, or for a documented and measured performance reason, e.g.
> + *  - a constant-folded fast path gated by @c __rte_constant() that is lost
> + *    unless the function is inlined into the caller;
> + *  - a wrapper around inline asm or an intrinsic that needs a
> + *    compile-time-constant argument from the caller's context.
> + *
> + * See the DPDK coding style guide for the full policy.
>   */
>  #ifdef RTE_TOOLCHAIN_MSVC
>  #define __rte_always_inline __forceinline
> @@ -491,7 +506,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)),
> used)) func(void)
>  #endif
> 
>  /**
> - * Force a function to be noinlined
> + * Force a function not to be inlined.
> + *
> + * Useful for explicitly outlining cold paths such as error handling,
> + * initialization, slow-path fallbacks, so they do not bloat the hot path
> + * or add to its instruction-cache footprint.
>   */
>  #ifdef RTE_TOOLCHAIN_MSVC
>  #define __rte_noinline __declspec(noinline)
> @@ -501,6 +520,12 @@ static void __attribute__((destructor(RTE_PRIO(prio)),
> used)) func(void)
> 
>  /**
>   * Hint function in the hot path
> + *
> + * The compiler may optimize the function more aggressively, treat calls
> + * to it as likely for branch prediction, and group it with other hot
> + * functions to improve instruction-cache locality. This affects code
> + * placement and prediction, not inlining; combine with an inline
> + * annotation if both are wanted.
>   */
>  #ifdef RTE_TOOLCHAIN_MSVC
>  #define __rte_hot
> @@ -510,6 +535,14 @@ static void __attribute__((destructor(RTE_PRIO(prio)),
> used)) func(void)
> 
>  /**
>   * Hint function in the cold path
> + *
> + * The compiler optimizes the function for size rather than speed,
> + * marks branches that reach it as unlikely, and may move it to a separate
> + * section to keep it off the hot path and reduce instruction-cache
> + * pressure there.
> + *
> + * Suitable for error handling, logging, and setup/teardown code.
> + * Functions marked @c __rte_noreturn are already treated as cold.
>   */
>  #ifdef RTE_TOOLCHAIN_MSVC
>  #define __rte_cold
> --

Acked-by: Konstantin Ananyev <[email protected]>

> 2.53.0

Reply via email to