Hi! While the issue is Hurd-specific, non-Hurd people might nevertheless be able to help here with their glibc/TLS expertise.
I'm working on a patch to move the Hurd's errno from the Hurd-specific threadvar (in short, a mechanism somewhat equivalent to TLS, using a portion of space at the beginning of a thread's stack for storing thread-specific data) to TLS proper. The specific glibc tree is <http://git.savannah.gnu.org/cgit/hurd/glibc.git/tree/?id=cba1c83ad62a11347684a9daf349e659237a1741>, but apart from Hurd-specifc patches this is equivalent to mainline commit fc56c5bbc1a0d56b9b49171dd377c73c268ebcfd. On Thu, 10 May 2012 17:25:59 +0800, I wrote: > $ gdb -q --args ./ld.so > Reading symbols from /home/tschwinge/tmp/ld.so...done. > (gdb) r > Starting program: /home/tschwinge/tmp/ld.so > > Program received signal EXC_BAD_ACCESS, Could not access memory. > 0x00015797 in __strerror_r (errnum=0, buf=0x0, buflen=2) at > dl-minimal.c:173 > 173 dl-minimal.c: No such file or directory. > in dl-minimal.c > (gdb) bt > #0 0x00015797 in __strerror_r (errnum=0, buf=0x0, buflen=2) at > dl-minimal.c:173 > #1 0x00000000 in ?? () > (gdb) info registers > eax 0x0 0 > ecx 0xa 10 > edx 0x2 2 > ebx 0x26ff4 159732 > esp 0x1028c60 0x1028c60 > ebp 0x1028cb8 0x1028cb8 > esi 0xa 10 > edi 0x21b4c 138060 > eip 0x15797 0x15797 <__strerror_r+167> > eflags 0x10202 [ IF RF ] > cs 0x17 23 > ss 0x1f 31 > ds 0x1f 31 > es 0x1f 31 > fs 0x1f 31 > gs 0x1f 31 > > 0x15797 is bogus: it's not even an instruction boundary. > > Apparently I forgot how to debug ld.so from the very beginning... > > It seems that gs is not set up, but even if that were an invalid TLS gs:X > access, that doesn't explain to me how the PC would be badly affected by > that? It turns out that GDB's understanding of addresses (.text only?) is off by 0x1000 (has been reloacted, I assume), so after hitting a breakpoint you have to »set $pc = $pc - 0x1000« to be able to make sense out of backtraces, etc. (For posterity, in case this is useful to someone who then remembers these words, I eventually figured this out by sprinkling a few »__asm __volatile ("hlt");« (to transfer control to GDB) before the places in ld.so code where TLS data (errno, specifically) is accessed, and then comparing the dissassembly and looking for looking for magic constants, where I found »movl $0x40000009,%gs:(%eax)« (»errno = EBADF«) and that constant only used in two places, one of them being __writev -- oh, it's trying to print something? -- etc., etc.) Manually offsetting each frame's PC by -0x1000 I then got a backtrace, which included: #3 0x00013fb6 in __assert_fail (assertion=0x1e114 "info == ((void *)0) || (info->d_un.d_val & ~0x00000008) == 0", file=0x1f4e3 "dynamic-link.h", line=207, function=0x1f6ec "elf_get_dynamic_info") at dl-minimal.c:208 #4 0x00003f69 in elf_get_dynamic_info (temp=0x0, l=0x24604) at dynamic-link.h:206 #5 _dl_start (arg=0x1027000) at rtld.c:416 In my understanding of x86 TLS (and that understanding is not too detailed), »movl $0x40000009,%gs:(%eax)« is local-exec TLS, which causes the linker to set the DF_STATIC_TLS flag, and thus the assertion in elf/dynamic-link.h, line 206 to fail: 202 #ifdef RTLD_BOOTSTRAP 203 /* Only the bind now flags are allowed. */ 204 assert (info[VERSYMIDX (DT_FLAGS_1)] == NULL 205 || (info[VERSYMIDX (DT_FLAGS_1)]->d_un.d_val & ~DF_1_NOW) == 0); 206 assert (info[DT_FLAGS] == NULL 207 || (info[DT_FLAGS]->d_un.d_val & ~DF_BIND_NOW) == 0); 208 /* Flags must not be set for ld.so. */ 209 assert (info[DT_RUNPATH] == NULL); 210 assert (info[DT_RPATH] == NULL); 211 #else (Again for posterity, and as GDB would not access the variable properly, I confirmed this by putting »volatile Elf32_Word tmp = info[DT_FLAGS]->d_un.d_val; __asm __volatile ("hlt");« before the assert, and then GDB could »print tmp« to confirm it was 0x10 (DF_STATIC_TLS).) (At this time, _hurd_init_dtablesize is zero, so it can't print anything yet, and errno is set to EBADF, triggering the faulting TLS access. Not knowing what this assert is good for, I simply made it allow the DF_STATIC_TLS case, too, and this allowed ld.so to progress a little bit further: if invoked without arguments, it is now able to print its usage information, elf/rtld.c:dl_main, line 1017. Yet, something like »./ld.so --library-path $PWD ./libc.so« still fails, and I (again manually with 0x1000 offset) obtained the following backtrace: #0 0x00004a69 in open_verify (name=0x25ae0 "/home/thomas/libc.so", fbp=0x1026a28, loader=0x0, whatcode=0, found_other_class=0x1026a27, free_name=true) at dl-load.c:1722 #1 0x00007915 in _dl_map_object (loader=0x0, name=0x102703b "/home/thomas/libc.so", type=1, trace_mode=0, mode=536870912, nsid=0) at dl-load.c:2285 #2 0x00002078 in dl_main (phdr=0x1034, phnum=7, user_entry=0x1026eac, auxv=0x0) at rtld.c:1084 #3 0x00012d25 in go (argdata=0x1026d90) at ../sysdeps/mach/hurd/dl-sysdep.c:213 #4 0x00015f46 in _hurd_startup (argptr=0x1027000, main=0x1026f94) at hurdstartup.c:188 #5 0x00013be3 in _dl_sysdep_start (start_argptr=0x1027000, dl_main=0x275a <dl_main+4096>) at ../sysdeps/mach/hurd/dl-sysdep.c:281 #6 0x0000421b in _dl_start_final (arg=0x1027000) at rtld.c:338 #7 _dl_start (arg=0x1027000) at rtld.c:564 dl-load.c:1722 again is an errno access, and the processor's segment register setup tells me TLS has not yet been initialized at that point. Now what is important is that glibc's Hurd-specific code, contrary to the Linux kernel-specific code, does not have a private errno for ld.so: sysdeps/mach/hurd/dl-sysdep.h: /* The private errno doesn't make sense on the Hurd. errno is always the thread-local slot shared with libc, and it matters to share the cell with libc because after startup we use libc functions that set errno (open, mmap, etc). */ #define RTLD_PRIVATE_ERRNO 0 And thus in the GNU Hurd configuration, ld.so code uses the TLS errno. In sysdeps/generic/dl-sysdep.h, this is explained/defined as follows: /* This macro must be defined to either 0 or 1. If 1, then an errno global variable hidden in ld.so will work right with all the errno-using libc code compiled for ld.so, and there is never a need to share the errno location with libc. This is appropriate only if all the libc functions that ld.so uses are called without PLT and always get the versions linked into ld.so rather than the libc ones. */ #ifdef IS_IN_rtld # define RTLD_PRIVATE_ERRNO 1 #else # define RTLD_PRIVATE_ERRNO 0 #endif Now, in elf/rtld.so:dl_main, TLS will eventually be initialized (at earliest when »we have auditing DSOs to load« -- but this is after mapping in objects (_dl_map_object which then invokes open_verify that contains the errno access). My naïve attempt to simply move »tcbp = init_tls ();« before mapping objects did not work out -- any suggestions to help me back onto firm ground? Any what, by the way, is the story that elf/rtld.c still contains code conditioned by USE___THREAD (and that code looking somewhat relevant for my case), but USE___THREAD not being defined anywhere? Grüße, Thomas
Description: PGP signature