Hi,

On 27.05.23 21:34, Linus Torvalds wrote:
On Sat, May 27, 2023 at 11:41 AM Frank Scheiner <frank.schei...@web.de> wrote:

Ok, I put the decoded console messages on [2].

[2]: https://pastebin.com/dLYMijfS

Ugh. Apparently ia64 decoding isn't great. But at least it gives
multiple line numbers:

    load_module (kernel/module/main.c:2291 kernel/module/main.c:2412
kernel/module/main.c:2868)

except your kernel obviously has those test-patches, so I still don't
know exactly where they are.

Erm, I see. I did recreate a vanilla v6.4-rc3 and ran that, decoded result is on [1] - not sure if it makes it a little better.

[1]: https://pastebin.com/z5XzEnhq

I did also try to build and run a SP kernel to maybe get a better picture in the traces, but that seems to require FLATMEM, which seems to not work on that machine or due to the way it is configured (and yeah, it was also the wrong commit I used for it and it was patched...):

```
[ 0.000000] Linux version 6.4.0-rc3-933174ae28ba72ab8de5b35cb7c98fc211235096-patch3_sp (root@x4270) (ia64-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 Sat May 27 21:28:44 CEST 2023
[...]
[ 0.000000] ACPI: SSDT 0x000000003FE35BA8 00013C (v01 HP rx2620 00000006 INTL 20050309)
[    0.000000] ACPI: Local APIC address (____ptrval____)
[    0.000000] 1 CPUs available, 1 CPUs total
[...]
[ 0.000000] Kernel panic - not syncing: Cannot use FLATMEM with 246784MB hole
[    0.000000] Please switch over to SPARSEMEM
[ 0.000000] ---[ end Kernel panic - not syncing: Cannot use FLATMEM with 246784MB hole
[    0.000000] Please switch over to SPARSEMEM ]---
```

But it looks like it is in move_module(). Strange. I don't know how it
gets to "__copy_user" from there...

[ Looks at the ia64 code ]

Oh.

It turns out that it *says* __copy_user(), but the code is actually
shared with the regular memcpy() function, which does

   GLOBAL_ENTRY(memcpy)
         and     r28=0x7,in0
         and     r29=0x7,in1
         mov     f6=f0
         mov     retval=in0
         br.cond.sptk .common_code
         ;;

where that ".common_code" label is - surprise surprise - the common
copy code, and so when the oops reports that the problem happened in
__copy_user(), it actually is in this case just a normal memcpy.

Ok, so it's probably the

         memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);

in move_module() that takes a fault.  And looking at the registers,
the destination is in r17/r18, and your dump has

     unable to handle kernel paging request at virtual address 1000000000000000
     ...
     r17 : 0fffffffffffffff r18 : 1000000000000000

so it's almost certainly that 'dest' that is bad.

Which I guess shouldn't surprise anybody.

But that's where my knowledge of ia64 and the new module loader layout ends.

Thanks for your help and going as far as you could, that's greatly appreciated. Running that stuff is surely easier than debugging it. :-)

Cheers,
Frank

Reply via email to