On Mon, 25 Apr 2022 15:07:58 +0000
Pedro Miguel Justo <pm...@texair.net> wrote:

> > On 2022/Apr/25, at 01:22, Pedro Miguel Justo <pm...@texair.net> wrote:
> > 
> > 
> >   
> >> On 2022/Apr/25, at 01:14, Frank Scheiner <frank.schei...@web.de> wrote:
> >> 
> >> Hi guys,
> >> 
> >> On 25.04.22 10:09, John Paul Adrian Glaubitz wrote:  
> >>>> From what I can understand by the information in the bugcheck, this is 
> >>>> somewhat related to a violation
> >>>> in parameter copy from user to kernel during some boot-time, crypto, 
> >>>> self-test. Does that sound right?
> >>>> If that is the case, how would this be related to FW?  
> >>> 
> >>> I'm not claiming that it must be related to the firmware, I'm just saying 
> >>> that I don't see this problem
> >>> on my RX2660 at all and I have even reinstalled it recently with one of 
> >>> the latest firmware images
> >>> without having to pass any parameter to the command line.  
> >> 
> >> A difference between Adrian's rx2660 and Pedro's rx2660 is Montecito
> >> left and Montvale right.
> >> 
> >> But could still be multiple other reasons we haven't looked at yet in
> >> detail:
> >> 
> >> * amount of memory installed
> >> * SMT enabled or not
> >> * number of processor modules installed
> >> 
> >> It might be possible for me to check on my rx2660s (one with Montvale
> >> and one with Montecito(s)) tomorrow. I will then also look at my other
> >> Itanium gear and gather relevant information.
> >>   
> > 
> > Yes, this sounds mode likely to me too.
> > 
> > The crypto self-tests seem to be an innocent bystander here. I tried 
> > booting the most recent kernel with the option “cryptomgr.notests” and it 
> > went much farther. Alas it still failed with another buffer copy validation 
> > for a different caller altogether:
> > 
> > [    3.836466]  [<a000000101353690>] usercopy_abort+0x120/0x130
> > [    3.836466]                                 sp=e0000001000cfdf0 
> > bsp=e0000001000c9388
> > [    3.836466]  [<a0000001004c5660>] __check_object_size+0x3c0/0x420
> > [    3.836466]                                 sp=e0000001000cfe00 
> > bsp=e0000001000c9350
> > [    3.836466]  [<a000000100570030>] sys_getcwd+0x250/0x420
> > [    3.836466]                                 sp=e0000001000cfe00 
> > bsp=e0000001000c92c8
> > [    3.836466]  [<a00000010000c860>] ia64_ret_from_syscall+0x0/0x20
> > [    3.836466]                                 sp=e0000001000cfe30 
> > bsp=e0000001000c92c8
> > [    3.836466]  [<a000000000040720>] ia64_ivt+0xffffffff00040720/0x400
> > [    3.836466]                                 sp=e0000001000d0000 
> > bsp=e0000001000c92c8
> > 
> > This suggests the bug might be in the logic validating these buffers 
> > against the allocations (heap, span, etc).
> > 
> > I don’t know why hardened_usercopy=off is not being observed by the kernel. 
> > As a work-around I am copying myself a new kernel with 
> > CONFIG_HARDENED_USERCOPY disabled at the source. 
> >   
> 
> Even with kernel "Linux debian 4.19.0-5-mckinley #1 SMP Debian 4.19.37-5 
> (2019-06-19) ia64 GNU/Linux"
> 
> Things are still not 100%. After a few hours into building the kernel it 
> started crashing also with usercopy validations but, this time, the other way 
> around. And because it was the other way around, it led to process 
> termination instead of full-blown bugcheck. This could be related or not. 
> Coule very well be a different bug that happens to manifest itself round the 
> same validation.
> 
>   CC [M]  drivers/net/wireless/realtek/rtw88/rtw8822be.o
>   LD [M]  drivers/net/wireless/realtek/rtw88/rtw88_8822be.o
>   CC [M]  drivers/net/wireless/realtek/rtw88/rtw8822c.o
> Segmentation fault
> make[5]: *** [scripts/Makefile.build:293: 
> drivers/net/wireless/realtek/rtw88/rtw8822c.o] Error 139
> make[5]: *** Deleting file 'drivers/net/wireless/realtek/rtw88/rtw8822c.o'
> make[4]: *** [scripts/Makefile.build:555: drivers/net/wireless/realtek/rtw88] 
> Error 2
> make[3]: *** [scripts/Makefile.build:555: drivers/net/wireless/realtek] Error 
> 2
> make[2]: *** [scripts/Makefile.build:555: drivers/net/wireless] Error 2
> make[1]: *** [scripts/Makefile.build:555: drivers/net] Error 2
> make: *** [Makefile:1855: drivers] Error 2
> pmsjt@debian:~/linux-source-5.17$ make
> 
> Message from syslogd@debian at Apr 25 07:58:08 ...
>  kernel:[23420.984012] usercopy: Kernel memory overwrite attempt detected to 
> linear kernel text (offset 1916912, size 8)!
> 
> Message from syslogd@debian at Apr 25 07:58:08 ...
>  kernel:[23421.268009] usercopy: Kernel memory overwrite attempt detected to 
> linear kernel text (offset 1818608, size 8)!
>   HOSTCC  scripts/sign-file
>   CALL    scripts/checksyscalls.sh
> <stdin>:1517:2: warning: #warning syscall clone3 not implemented [-Wcpp]
>   CALL    scripts/atomic/check-atomics.sh
>   CHK     include/generated/compile.h
> make[2]: *** [scripts/Makefile.build:294: arch/ia64/kernel/signal.o] 
> Segmentation fault
> 
> Message from syslogd@debian at Apr 25 07:58:11 ...
>  kernel:[23423.626254] usercopy: Kernel memory overwrite attempt detected to 
> linear kernel text (offset 1933296, size 8)!
> make[1]: *** [scripts/Makefile.build:555: arch/ia64/kernel] Error 2
> make: *** [Makefile:1855: arch/ia64] Error 2

In my understanding hardened_usercopy=on is completely broken on ia64
today. It can't run any userspace. Even init process would not survive
machine boot. At least that's what I experienced on rx3600.

Thus I think if your system survives that much time I would guess
that you have hardened_usercopy=off in full effect at least at boot.

I would speculate it's some kind of memory corruption around
'bypass_usercopy_checks' key.

Worth adding a few printk()s to mm/usercopy.c into 'usercopy_abort()'
and into 'set_hardened_usercopy()' just to make sure 'bypass_usercopy_checks'
has expected 'true' setting at boot time and at crash time.

-- 

  Sergei

Reply via email to