Nice. Or not so nice. Good to know.

barret rhoden <[email protected]> schrieb am Sa. 28. Jan. 2017 um 19:11:

> hi -
>
> it turns out that parlib's ctors, uthread_lib_init() and
> vcore_lib_init(), can be run multiple times, despite the
> init_once_racy() thing.
>
> any shared objects (.so but not .a, such as libelf.so) we make also
> have their own copy of those ctors.  (surprise)!  when linked with a
> binary that also has those ctors, both copies of the ctors run, and
> both use different data structures (so the static bool controlling
> whether it ran or not was a different location in memory).
>
> it makes sense.  if they think they have a ctor, then they gotta run
> it.  there's no dynamic lookup saying "run this other function".
> the .so happens to have a function called uthread_lib_init().  it's
> just undesirable and unnecessary to have that ctor in that library.
>
> i found this because i had a bug in parlib initialization that i fixed,
> but then saw it again in a different program.  that program linked in
> elfutils (-lelf).  i saw the same problem with another -lelf binary.
> that, plus the fact it was dying very early in the process's lifetime,
> made me suspect constructors and the .so.
>
> overall, this is probably due to how we tell gcc to do a
> "--whole-archive -lparlib --no-whole-archive" for anything we build.
> we want all final programs to be linked with parlib, but not
> necessarily all .sos.
>
> i can get around the fact that init_once_racy() doesn't work (use the
> 'early SCP / VC-CTX-READY' flag in procdata).  but we'd still have to
> be careful to always rebuild any shared libraries whenever we change
> parts of parlib.
>
> it'd be nice to change our gcc hack so that it only applies to final
> objects.  we might just have to live with it, though.
>
> for those curious, here's the BT of the bug:
>
> #01 Addr 0x00004000002fcc3e is in libc-2.19.so at offset
> 0x00000000000d1c3e  __ros_syscall_errno@@GLIBC_2.2.5+0x6e
> #02 Addr 0x0000400000016ab2 is in libelf-0.164.so at offset
> 0x0000000000014ab2  sys_yield+0x22
> #03 Addr 0x00004000000186f7 is in libelf-0.164.so at offset
> 0x00000000000166f7  thread0_sched_entry+0x37
> #04 Addr 0x0000400000015a01 is in libelf-0.164.so at offset
> 0x0000000000013a01  uthread_vcore_entry+0xf1
> #05 Addr 0x0000400000005e34 is in libelf-0.164.so at offset
> 0x0000000000003e34  __get_dtls.part.0+0x0
> #06 Addr 0x0000400000015dbd is in libelf-0.164.so at offset
> 0x0000000000013dbd  uthread_yield+0x21d
> #07 Addr 0x00004000002fcc57 is in libc-2.19.so at offset
> 0x00000000000d1c57  __ros_syscall_errno@@GLIBC_2.2.5+0x87
> #08 Addr 0x00004000002fcf6c is in libc-2.19.so at offset
> 0x00000000000d1f6c  mmap@@GLIBC_2.2.5+0x5c
> #09 Addr 0x0000400000018deb is in libelf-0.164.so at offset
> 0x0000000000016deb  ucq_init+0x2b
> #10 Addr 0x000040000001a00d is in libelf-0.164.so at offset
> 0x000000000001800d  get_eventq+0x1d
> #11 Addr 0x000040000001b702 is in libelf-0.164.so at offset
> 0x0000000000019702  init_posix_signals+0x32
> #12 Addr 0x0000400000005f03 is in libelf-0.164.so at offset
> 0x0000000000003f03  uthread_lib_init+0xa3
> #13 Addr 0x000040000001df66 is in libelf-0.164.so at offset
> 0x000000000001bf66  __do_global_ctors_aux+0x26
> #14 Addr 0x0000400000005847 is in libelf-0.164.so at offset
> 0x0000000000003847  _init+0x1f
> #15 Addr 0x000000000010eb6c is in ld-2.19.so at offset
> 0x000000000000eb6c  _dl_init_internal+0x8c
> #16 Addr 0x0000000000100e9a is in ld-2.19.so at offset
> 0x0000000000000e9a  oom+0x6a
>
> a few things in parlib's initialization assumed some syscalls would
> never block.  under the new strace, many syscalls can block, including
> mmap.  (the kernel is aware of syscalls that can never block, and
> strace will just drop those under load).
>
> what happened here is that mmap (#08) blocked in the middle of
> uthread_lib_init.  we just previous set the 2LS ops, set blockon, and
> told the kernel we were ready for VC ctx.  but we hadn't installed any
> syscall event handlers (or INDIR handlers, which also matters).  so the
> kernel would try to send an event, but we didn't set anything up to
> receive it.  we yielded an never woke up.
>
> strace saw it too:
>
> E Syscall  18 (        mmap):(0x0, 0x2000, 0x3, 0x8020,
> 0xffffffffffffffff, 0x0) ret: --- proc: 26 core: 0 vcore: 0 data: ''
> X Syscall  18 (        mmap):(0x0, 0x2000, 0x3, 0x8020,
> 0xffffffffffffffff, 0x0) ret: 0x40000057c000 proc: 26 core: 0 vcore: 0
> data: ''
> E Syscall  13 (  proc_yield):(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc:
> 26 core: 0 vcore: 0 data: ''
> X Syscall  13 (  proc_yield):(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc:
> 26 core: 0 vcore: 0 data: ''
>
> There's actually a few bugs that have popped up due to having some
> syscalls block that had never blocked before.  (They are blocking
> because the strace queue is overflowing - i have it set to hold only 1
> or 2 entries).  I'll probably make a test that makes every syscall
> (that isn't blacklisted) block briefly.  I've fixed most of those bugs i
> found already, but the .so ctor surprise added a couple hours to it. =)
>
> barret
>
> --
> You received this message because you are subscribed to the Google Groups
> "Akaros" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to