On Thu, Jan 08, 2026 at 09:25:28AM +0000, Ard Biesheuvel wrote:
> Currently, idt_table is allocated as page-aligned .bss, and remapped
> read-only after init. This breaks a 2 MiB large page into 4k page
> mappings, which defeats some of the effort done at boot to map the
> kernel image using large pages, for improved TLB efficiency.
> 
> Mark this allocation as __ro_after_init instead, so it will be made
> read-only automatically after boot, without breaking up large page
> mappings.
> 
> This also fixes a latent bug on i386, where the size of idt_table is
> less than a page, and so remapping it read-only could potentially affect
> other read-write variables too, if those are not page-aligned as well.
> 
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
>  arch/x86/kernel/idt.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> index f445bec516a0..d6da25d7964f 100644
> --- a/arch/x86/kernel/idt.c
> +++ b/arch/x86/kernel/idt.c
> @@ -170,7 +170,7 @@ static const __initconst struct idt_data apic_idts[] = {
>  };
>  
>  /* Must be page-aligned because the real IDT is used in the cpu entry area */
> -static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
> +static gate_desc idt_table[IDT_ENTRIES] __aligned(PAGE_SIZE) __ro_after_init;
>  
>  static struct desc_ptr idt_descr __ro_after_init = {
>       .size           = IDT_TABLE_SIZE - 1,
> @@ -308,9 +308,6 @@ void __init idt_setup_apic_and_irq_gates(void)
>       idt_map_in_cea();
>       load_idt(&idt_descr);
>  
> -     /* Make the IDT table read only */
> -     set_memory_ro((unsigned long)&idt_table, 1);
> -
>       idt_setup_done = true;
>  }

Good idea, except my guest shows me something else:

before:

[    0.186281] IDT table: 0xffffffff89c7f000

0xffffffff89c00000-0xffffffff89c7f000         508K     RW                 GLB 
NX pte
0xffffffff89c7f000-0xffffffff89c80000           4K     ro                 GLB 
NX pte
0xffffffff89c80000-0xffffffff89e00000        1536K     RW                 GLB 
NX pte
0xffffffff89e00000-0xffffffff8be00000          32M     RW         PSE     GLB 
NX pmd

This is clearly a single, 4K RO pageframe right in the middle of a splintered
2M page.

after:

[    0.180635] IDT table: 0xffffffff822cf000

0xffffffff81e00000-0xffffffff82200000           4M     ro         PSE     GLB 
NX pmd
0xffffffff82200000-0xffffffff8236f000        1468K     ro                 GLB 
NX pte
0xffffffff8236f000-0xffffffff82400000         580K     RW                 GLB 
NX pte
0xffffffff82400000-0xffffffff89800000         116M     RW         PSE     GLB 
NX pmd

but after applying your patch it looks like it still broke the 2M mapping as
the remaining piece is RW.

If I do this:

static gate_desc idt_table[IDT_ENTRIES] __aligned(PMD_SIZE) __ro_after_init;

it still doesn't help:

[    0.197808] IDT table: 0xffffffff82800000

0xffffffff81e00000-0xffffffff82800000          10M     ro         PSE     GLB 
NX pmd
0xffffffff82800000-0xffffffff828a0000         640K     ro                 GLB 
NX pte
0xffffffff828a0000-0xffffffff82a00000        1408K     RW                 GLB 
NX pte
0xffffffff82a00000-0xffffffff89e00000         116M     RW         PSE     GLB 
NX pmd

because that trailing piece of the 2M page is still RW.

And who knows what else am I breaking when doing this:

[    2.368601] ------------[ cut here ]------------
[    2.389816] [CRTC:35:crtc-0] vblank wait timed out
[    2.396676] WARNING: drivers/gpu/drm/drm_atomic_helper.c:1920 at 
drm_atomic_helper_wait_for_vblanks.part.0+0x1ba/0x1e0, CPU#1: kworker/1:0/57
[    2.406715] Modules linked in:
[    2.408462] CPU: 1 UID: 0 PID: 57 Comm: kworker/1:0 Not tainted 6.19.0-rc6+ 
#4 PREEMPT(full)
...

I don't know, sacrificing a 2M page just for the idt_table and so that it
doesn't get splintered, not sure it is worth it.

Hmmm.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Reply via email to