bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

Ludovic Courtès Sun, 09 Oct 2022 09:10:10 -0700

Hi!

Ludovic Courtès <l...@gnu.org> skribis:


> $ addr2line -e  
> /gnu/store/m8afvcgwmrfhvjpd7b0xllk8vv5isd6j-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1
>  0x1000 0x11627 0x11bb
> ??:0
> /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/glibc-2.33/elf/dl-misc.c:333
> :?
>
>
> That’s ‘_dl_fatal_printf’ calling ‘_exit’; it’s trying to tell us
> something.
>
> I’ll try and rebuild the system with the debugging patches at
> <https://lists.gnu.org/archive/html/bug-hurd/2011-11/msg00038.html>, to
> get early ld.so output, for lack of a better solution…

I tried adapted the patches above and tried them, but it seems that
‘_dl_sysdep_start’ isn’t even reached.  For example, I set a breakpoint
on ‘mach_task_self’ (called from ‘__mach_init’, called from
‘_dl_sysdep_start’), but that’s never reached (I’m assuming ‘break/tu’
is reliable, is it?).

The user-space backtrace upon trap remains unhelpful:

--8<---------------cut here---------------start------------->8---
start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1]Kernel Breakpoint 
trap,                                                        
 eip 0xc1030d5b                                                                 
                                                        
Breakpoint at  task_resume:     pushl   %ebp                                    
                                                        
db> debug traps /on                                                             
                                                        
db> b task_terminate                                                            
                                                        
set breakpoint #2                                                               
                                                        
db> c                                                                           
                                                        
Kernel Debug trap trap, eip 0xc1030d5b                                          
                                                        
 execkernel: Page fault (14), code=6                                            
                                                        
Stopped at  0x1000:     pushl   0x4(%ebx)                                       
                                                        
>>>>> user space <<<<<                                                          
>>>>>                                                         
0x1000(bfffff24,0,0,1160b,0)                                                    
                                                        
0x11627(bfffff9c,0,0,0,2)                                                       
                                                        
0x11bb()                                                                        
                                                        
--8<---------------cut here---------------end--------------->8---

… where:

--8<---------------cut here---------------start------------->8---
$ addr2line -e 
/gnu/store/4p1kab1c4h7h3kvgcm1hbjja4y5k9x4p-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1
 0x11627 0x11bb
/tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/glibc-2.33/elf/dl-misc.c:333
:?
$ objdump -S 
/gnu/store/4p1kab1c4h7h3kvgcm1hbjja4y5k9x4p-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1
 --start-address=0x000011b0 |head -40

/gnu/store/4p1kab1c4h7h3kvgcm1hbjja4y5k9x4p-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1:
     file format elf32-i386


Disassembly of section .text:

000011b0 <_start>:
    11b0:       89 e0                   mov    %esp,%eax
    11b2:       83 ec 0c                sub    $0xc,%esp
    11b5:       50                      push   %eax
    11b6:       e8 b5 0a 00 00          call   1c70 <_dl_start>
    11bb:       83 c4 10                add    $0x10,%esp
--8<---------------cut here---------------end--------------->8---

So it would seem that ‘_dl_start’ is called and somehow then a tail-call
to ‘_dl_fatal_printf’ is made.

Through a dichotomy I tried to see how far it goes.  The info I have so
far is that ld.so errors out from elf/rtld.c:563 (line 565 is not
reached):

--8<---------------cut here---------------start------------->8---
558:  if (bootstrap_map.l_addr || ! 
bootstrap_map.l_info[VALIDX(DT_GNU_PRELINKED)])
559:    {
560:      /* Relocate ourselves so we can do normal function calls and
561:         data access using the global offset table.  */
562:
563:      ELF_DYNAMIC_RELOCATE (&bootstrap_map, 0, 0, 0);
564:    }
565:  bootstrap_map.l_relocated = 1;
...
578:  __rtld_malloc_init_stubs ();
--8<---------------cut here---------------end--------------->8---

It’s hard to be more precise because ELF_DYNAMIC_RELOCATE is a macro
that expands to quite a lot of code.

I don’t see the code path that would lead to a ‘_dl_fatal_printf’ call
though.

Ideas?  :-)

Ludo’.

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

Reply via email to