On 2/27/26 11:15 AM, Dipayaan Roy wrote:
> On certain systems configured with 4K PAGE_SIZE, utilizing page_pool
> fragments for RX buffers results in a significant throughput regression.
> Profiling reveals that this regression correlates with high overhead in the
> fragment allocation and reference counting paths on these specific
> platforms, rendering the multi-buffer-per-page strategy counterproductive.
> 
> To mitigate this, bypass the page_pool fragment path and force a single RX
> packet per page allocation when all the following conditions are met:
>   1. The system is configured with a 4K PAGE_SIZE.
>   2. A processor-specific quirk is detected via SMBIOS Type 4 data.
> 
> This approach restores expected line-rate performance by ensuring
> predictable RX refill behavior on affected hardware.
> 
> There is no behavioral change for systems using larger page sizes
> (16K/64K), or platforms where this processor-specific quirk do not
> apply.
> 
> Signed-off-by: Dipayaan Roy <[email protected]>
> ---
>  .../net/ethernet/microsoft/mana/gdma_main.c   | 120 ++++++++++++++++++
>  drivers/net/ethernet/microsoft/mana/mana_en.c |  23 +++-
>  include/net/mana/gdma.h                       |  10 ++
>  3 files changed, 151 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c 
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 0055c231acf6..26bbe736a770 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -9,6 +9,7 @@
>  #include <linux/msi.h>
>  #include <linux/irqdomain.h>
>  #include <linux/export.h>
> +#include <linux/dmi.h>
>  
>  #include <net/mana/mana.h>
>  #include <net/mana/hw_channel.h>
> @@ -1955,6 +1956,115 @@ static bool mana_is_pf(unsigned short dev_id)
>       return dev_id == MANA_PF_DEVICE_ID;
>  }
>  
> +/*
> + * Table for Processor Version strings found from SMBIOS Type 4 information,
> + * for processors that needs to force single RX buffer per page quirk for
> + * meeting line rate performance with ARM64 + 4K pages.
> + * Note: These strings are exactly matched with version fetched from SMBIOS.
> + */
> +static const char * const mana_single_rxbuf_per_page_quirk_tbl[] = {
> +     "Cobalt 200",
> +};
> +
> +static const char *smbios_get_string(const struct dmi_header *hdr, u8 idx)
> +{
> +     const u8 *start, *end;
> +     u8 i;
> +
> +     /* Indexing starts from 1. */
> +     if (!idx)
> +             return NULL;
> +
> +     start   = (const u8 *)hdr + hdr->length;
> +     end = start + SMBIOS_STR_AREA_MAX;
> +
> +     for (i = 1; i < idx; i++) {
> +             while (start < end && *start)
> +                     start++;
> +             if (start < end)
> +                     start++;
> +             if (start + 1 < end && start[0] == 0 && start[1] == 0)
> +                     return NULL;
> +     }
> +
> +     if (start >= end || *start == 0)
> +             return NULL;
> +
> +     return (const char *)start;

If I read correctly, the above sort of duplicate dmi_decode_table().

I think you are better of:
- use the mana_get_proc_ver_from_smbios() decoder to store the
SMBIOS_TYPE4_PROC_VERSION_OFFSET index into gd
- do a 2nd walk with a different decoder to fetch the string at the
specified index.

/P


Reply via email to