Hi Johannes,

On 2026-06-16 14:49, Johannes Schneider wrote:
> The BL32 fw-external blob is loaded into DRAM by the PBL and then
> SHA-256-verified inside get_builtin_firmware_ext(). The verify runs
> in PBL phase 1 with the MMU off and D-cache cold, walking ~720 KiB
> through uncached DRAM accesses; on a Cortex-A53 this costs around
> 2 s of pre-BL31 wall-clock on every boot.
> 
> The verify is the only thing anchoring the BL32 content to the
> signed PBL: HABv4 on i.MX8M only signs and loads what fits in
> on-chip SRAM (= the PBL), and BL31/BL32 reach DRAM via PBL-driven
> copies, so skipping the SHA-256 would be a security regression.
> 
> Turn on MMU + D-cache once the DRAM is populated and right before
> the SHA-256 verify + BL31/BL32 memcpy run, and drop the MMU again
> right before the BL31 entry (BL31 expects MMU off). Mirrors the
> Rockchip handling in commits f2ae1a4a85 ("ARM: rockchip: atf:
> enable MMU in PBL") and a0ef3a1b5c ("ARM: rockchip: atf: pass
> correct memsize to mmu_early_enable()").
> 
> Measured on i.MX8MM and i.MX8MP (Cortex-A53, ~720 KiB BL32 blob):
> the BL32 verify drops from ~2 s to ~300 ms (generic-C SHA-256 in
> both cases, the difference is the D-cache state) and the BL31
> early-init also benefits from the warm cache (~200 ms saved).
> 
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Johannes Schneider <[email protected]>
> ---
> --- a/arch/arm/mach-imx/atf.c
> +++ b/arch/arm/mach-imx/atf.c
> @@ -20,6 +20,7 @@
>  #include <mach/imx/xload.h>
>  #include <mach/imx/snvs.h>
>  #include <pbl.h>
> +#include <asm/mmu.h>
>  
>  static void imx_adjust_optee_memory(void **bl32, void **bl32_image, size_t 
> *bl32_size)
>  {
> @@ -187,6 +188,9 @@
>                    "r" (tfa_dest - 16) :
>                    "cc");
>  
> +     /* BL31 expects MMU off. */
> +     mmu_disable();
> +
>       /*
>        * If enabled the bl_params are passed via x0 to the TF-A, except for
>        * the i.MX8MQ which doesn't support bl_params yet.
> @@ -284,6 +288,12 @@
>       imx8m_setup_snvs();
>       imx8mm_load_bl33(bl33);
>  
> +     /* Cache DRAM for the BL32 verify + BL31/BL32 memcpy that follow. */
> +     mmu_early_enable(MX8M_DDR_CSD1_BASE_ADDR,
> +                      imx8m_barebox_earlymem_size(32),
> +                      MX8M_DDR_CSD1_BASE_ADDR +
> +                      imx8m_barebox_earlymem_size(32) - OPTEE_SIZE);

mmu_early_enable() only takes two arguments.

Please add a new imx8m_mmu_early_enable() function usable for all i.MX8M to
enable the MMU.

We already have this snippet elsewhere:

        endmem = MX8M_DDR_CSD1_BASE_ADDR;
        if (cpu_is_mx8mn())
                endmem += imx8m_barebox_earlymem_size(16);
        else
                endmem += imx8m_barebox_earlymem_size(32);

Which could help implementing this function.

Sascha


Reply via email to