On Mon, Jun 22, 2020 at 06:18:02PM -0700, Saravana Kannan wrote:
> When loading a module, module_frob_arch_sections() tries to figure out
> the number of PLTs that'll be needed to handle all the RELAs. While
> doing this, it tries to dedupe PLT allocations for multiple
> R_AARCH64_CALL26 relocations to the same symbol. It does the same for
> R_AARCH64_JUMP26 relocations.
> 
> To make checks for duplicates easier/faster, it sorts the relocation
> list by type, symbol and addend. That way, to check for a duplicate
> relocation, it just needs to compare with the previous entry.
> 
> However, sorting the entire relocation array is unnecessary and
> expensive (O(n log n)) because there are a lot of other relocation types
> that don't need deduping or can't be deduped.
> 
> So this commit partitions the array into entries that need deduping and
> those that don't. And then sorts just the part that needs deduping. And
> when CONFIG_RANDOMIZE_BASE is disabled, the sorting is skipped entirely
> because PLTs are not allocated for R_AARCH64_CALL26 and R_AARCH64_JUMP26
> if it's disabled.
> 
> This gives significant reduction in module load time for modules with
> large number of relocations with no measurable impact on modules with a
> small number of relocations. In my test setup with CONFIG_RANDOMIZE_BASE
> enabled, these were the results for a few downstream modules:
> 
> Module                Size (MB)
> wlan          14
> video codec   3.8
> drm           1.8
> IPA           2.5
> audio         1.2
> gpu           1.8
> 
> Without this patch:
> Module                Number of entries sorted        Module load time (ms)
> wlan          243739                          283
> video codec   74029                           138
> drm           53837                           67
> IPA           42800                           90
> audio         21326                           27
> gpu           20967                           32
> 
> Total time to load all these module: 637 ms
> 
> With this patch:
> Module                Number of entries sorted        Module load time (ms)
> wlan          22454                           61
> video codec   10150                           47
> drm           13014                           40
> IPA           8097                            63
> audio         4606                            16
> gpu           6527                            20
> 
> Total time to load all these modules: 247
> 
> Time saved during boot for just these 6 modules: 390 ms
> 
> Cc: Ard Biesheuvel <[email protected]>
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> 
> v1 -> v2:
> - Provided more details in the commit text
> - Pulled in Will's comments on the coding style
> - Pulled in Ard's suggestion about skipping jumps with the same section
>   index (parts of Will's suggested code)
> 
>  arch/arm64/kernel/module-plts.c | 46 ++++++++++++++++++++++++++++++---
>  1 file changed, 43 insertions(+), 3 deletions(-)

Nice, it looks like you were more-or-less able to use my suggestion
directly! Commit message looks much better to, so:

Acked-by: Will Deacon <[email protected]>

Catalin can pick this up when he starts queuing patches for 5.9.

Cheers,

Will

Reply via email to