Le 01/09/2023 à 00:44, Erhard Furtner a écrit : > On Thu, 31 Aug 2023 05:32:46 +0000 > Christophe Leroy <christophe.le...@csgroup.eu> wrote: > >> Ok so there is some corrupted memory somewhere. >> >> Can you try what happens when you remove the call to kasan_init() at the >> start of setup_arch() in arch/powerpc/kernel/setup-common.c > > Ok, so I left the other patches in place + btext_map() instead of > btext_unmap() at the end of MMU_init() + Michaels patch and additionally > commented-out kasan_init() as stated above. The outcome is rather > interesting! Now I deterministically get this output at boot OF console, > regardless wheter it's a cold boot or warm boot:
Ah, my bad. You also need to remove the call to kasan_late_init() in mem_init() in arch/powerpc/mm/mem.c Nevertheless, your result is interesting as it shows that the boot goes much further when we don't initialise KASAN. It probably means that kasan_init() messed up things. I will try to dig a bit more in kasan_init() and see what we can look at. Christophe > > via-pmu: Server Mode is disabled > PMU driver v2 initialized for Core99, firmware: 0c > ioremap() called early from pmac_nvram_init+0x208/0x7c0. Use early_ioremap() > instead > nvram: Checking bank 0... > nvram: gen0=3234, gen1=3235 > nvram: Active bank is: 1 > nvram: OF partition at 0x410 > nvram: XP partition at 0x1020 > nvram: NR partition at 0x1120 > Top of RAM: 0x80000000, Total RAM: 0x80000000 > Memory hole size: 0MB > Zone ranges: > DMA [mem 0x0000000000000000-0x000000002fffffff] > Normal empty > HighMem [mem 0x0000000030000000-0x000000007fffffff] > Movable zone start for each node > Early memory node ranges > node 0: [mem 0x0000000000000000-0x000000007fffffff] > Initmem setup node 0 [mem 0x0000000000000000-0x000000007fffffff] > percpu: Embedded 14 pages/cpu s24608 r8192 d24544 u57344 > pcpu-alloc: s24608 r8192 d24544 u57344 alloc=14*4096 > pcpu-alloc: [0] 0 > Kernel command line: ro root=/dev/sda5 nr_cpus=1 zswap.max_pool_percent=16 > slub_debug=FZP page_poison=1 > netconsole=6666@192.168.178.8/eth0,6666@192.168.178.3/70:85:C2:30:EC:01 > init=/usr/lib/systemd/systemd > Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear) > Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear) > Built 1 zonelists, mobility grouping on. Total pages: 522560 > mem auto-init: stack:all(pattern), heap alloc:off, heap free:off > stackdepot: allocating hash table via alloc_large_system_hash > stackdepot hash table entries: 1048576 (order: 10, 4194304 bytes, linear) > ================================================================== > BUG: KASAN: stack-out-of-bounds in __kernel_poison_pages+0x6c/0xd0 > Write of size 4896 at addr c17a7000 by task swapper/0 > > CPU: 0 PID: 0 Comm: swapper Tainted: G T > 6.5.0-rc7-PMacG4-dirty #7 > Hardware name: PowerMac3,6 7455 0x80010303 PowerMac > Call Trace: > [c1717ce0] [c0f4ec40] dump_stack_lvl+0x60/0xa4 (unreliable) > [c1717d00] [c0368380] print_report+0x154/0x548 > [c1717d50] [c036813c] kasan_report+0xd0/0x160 > [c1717db0] [c0369bb4] kasan_check_range+0x1c8/0x308 > [c1717dc0] [c036ae88] memset+0x34/0x90 > [c1717de0] [c035b6e0] __kernel_poison_pages+0x6c/0xd0 > [c1717e00] [c03355e4] __free_pages_ok+0x418/0x500 > [c1717e60] [c14372c8] memblock_free_all+0x268/0x400 > [c1717f20] [c14103fc] mem_init+0x8c/0x274 > [c1717f60] [c1431cd0] mm_core_init+0x240/0x4e0 > [c1717fc0] [c1404694] start_kernel+0x150/0x2d8 > [c1717f00] [000035d0] 0x35d0 > > The buggy address belongs to the physical page: > page:(ptrval) refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0x17a7 > flags: 0x0(zone=0) > page_type: 0xffffffff() > raw: 00000000 eee15380 eee15380 00000000 00000000 00000000 ffffffff 00000000 > raw: 00000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > c17a7d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > c17a7d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> c17a7e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 > ^ > c17a7e80: f1 f1 04 f2 04 f2 00 f3 f3 f3 00 00 00 00 00 00 > c17a7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ================================================================== > Disabling lock debugging due to kernel taint > >> I'd also be curious to know what happens when CONFIG_DEBUG_SPINLOCK is >> disabled. > > Disabling CONFIG_DEBUG_SPINLOCK does not change the output above. ^^ > >> Another question which I'm no sure I asked already: Is it a new problem >> you have got with recent kernels or is it just that you never tried such >> a config with older kernels ? > > I wanted to revisit https://bugzilla.kernel.org/show_bug.cgi?id=216041 and > verify whether it was resolved. KASAN worked around 2019-2021 on my G4 as I > reported some bugs with it around that time and you fixed some of the bugs. > ;) Like kernel bugzilla #205099, #216190, #205885. > > But it always seemed flaky on the G4 and had it's problems. So I can't tell > whether this specific issue was there back then or if it's new. At least bug > #216190 was also about KASAN and SMP issues. > >> Also, when you say you need to start with another SMP kernel first and >> then you don't have the problem anymore until the next cold reboot, do >> you mean you have some old kernel with KASAN that works, or is it a >> kernel without KASAN that you have to start first ? > > First. I start with a non-KASAN SMP kernel and afterwards reboot into a KASAN > kernel. But now with kasan_init() commented-out in start of setup_arch() in > arch/powerpc/kernel/setup-common.c this does not work anymore. I get the > dmesg above all the time, at cold and warm boots. > > Regards, > Erhard