Nice. Or not so nice. Good to know. barret rhoden <[email protected]> schrieb am Sa. 28. Jan. 2017 um 19:11:
> hi - > > it turns out that parlib's ctors, uthread_lib_init() and > vcore_lib_init(), can be run multiple times, despite the > init_once_racy() thing. > > any shared objects (.so but not .a, such as libelf.so) we make also > have their own copy of those ctors. (surprise)! when linked with a > binary that also has those ctors, both copies of the ctors run, and > both use different data structures (so the static bool controlling > whether it ran or not was a different location in memory). > > it makes sense. if they think they have a ctor, then they gotta run > it. there's no dynamic lookup saying "run this other function". > the .so happens to have a function called uthread_lib_init(). it's > just undesirable and unnecessary to have that ctor in that library. > > i found this because i had a bug in parlib initialization that i fixed, > but then saw it again in a different program. that program linked in > elfutils (-lelf). i saw the same problem with another -lelf binary. > that, plus the fact it was dying very early in the process's lifetime, > made me suspect constructors and the .so. > > overall, this is probably due to how we tell gcc to do a > "--whole-archive -lparlib --no-whole-archive" for anything we build. > we want all final programs to be linked with parlib, but not > necessarily all .sos. > > i can get around the fact that init_once_racy() doesn't work (use the > 'early SCP / VC-CTX-READY' flag in procdata). but we'd still have to > be careful to always rebuild any shared libraries whenever we change > parts of parlib. > > it'd be nice to change our gcc hack so that it only applies to final > objects. we might just have to live with it, though. > > for those curious, here's the BT of the bug: > > #01 Addr 0x00004000002fcc3e is in libc-2.19.so at offset > 0x00000000000d1c3e __ros_syscall_errno@@GLIBC_2.2.5+0x6e > #02 Addr 0x0000400000016ab2 is in libelf-0.164.so at offset > 0x0000000000014ab2 sys_yield+0x22 > #03 Addr 0x00004000000186f7 is in libelf-0.164.so at offset > 0x00000000000166f7 thread0_sched_entry+0x37 > #04 Addr 0x0000400000015a01 is in libelf-0.164.so at offset > 0x0000000000013a01 uthread_vcore_entry+0xf1 > #05 Addr 0x0000400000005e34 is in libelf-0.164.so at offset > 0x0000000000003e34 __get_dtls.part.0+0x0 > #06 Addr 0x0000400000015dbd is in libelf-0.164.so at offset > 0x0000000000013dbd uthread_yield+0x21d > #07 Addr 0x00004000002fcc57 is in libc-2.19.so at offset > 0x00000000000d1c57 __ros_syscall_errno@@GLIBC_2.2.5+0x87 > #08 Addr 0x00004000002fcf6c is in libc-2.19.so at offset > 0x00000000000d1f6c mmap@@GLIBC_2.2.5+0x5c > #09 Addr 0x0000400000018deb is in libelf-0.164.so at offset > 0x0000000000016deb ucq_init+0x2b > #10 Addr 0x000040000001a00d is in libelf-0.164.so at offset > 0x000000000001800d get_eventq+0x1d > #11 Addr 0x000040000001b702 is in libelf-0.164.so at offset > 0x0000000000019702 init_posix_signals+0x32 > #12 Addr 0x0000400000005f03 is in libelf-0.164.so at offset > 0x0000000000003f03 uthread_lib_init+0xa3 > #13 Addr 0x000040000001df66 is in libelf-0.164.so at offset > 0x000000000001bf66 __do_global_ctors_aux+0x26 > #14 Addr 0x0000400000005847 is in libelf-0.164.so at offset > 0x0000000000003847 _init+0x1f > #15 Addr 0x000000000010eb6c is in ld-2.19.so at offset > 0x000000000000eb6c _dl_init_internal+0x8c > #16 Addr 0x0000000000100e9a is in ld-2.19.so at offset > 0x0000000000000e9a oom+0x6a > > a few things in parlib's initialization assumed some syscalls would > never block. under the new strace, many syscalls can block, including > mmap. (the kernel is aware of syscalls that can never block, and > strace will just drop those under load). > > what happened here is that mmap (#08) blocked in the middle of > uthread_lib_init. we just previous set the 2LS ops, set blockon, and > told the kernel we were ready for VC ctx. but we hadn't installed any > syscall event handlers (or INDIR handlers, which also matters). so the > kernel would try to send an event, but we didn't set anything up to > receive it. we yielded an never woke up. > > strace saw it too: > > E Syscall 18 ( mmap):(0x0, 0x2000, 0x3, 0x8020, > 0xffffffffffffffff, 0x0) ret: --- proc: 26 core: 0 vcore: 0 data: '' > X Syscall 18 ( mmap):(0x0, 0x2000, 0x3, 0x8020, > 0xffffffffffffffff, 0x0) ret: 0x40000057c000 proc: 26 core: 0 vcore: 0 > data: '' > E Syscall 13 ( proc_yield):(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: > 26 core: 0 vcore: 0 data: '' > X Syscall 13 ( proc_yield):(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: > 26 core: 0 vcore: 0 data: '' > > There's actually a few bugs that have popped up due to having some > syscalls block that had never blocked before. (They are blocking > because the strace queue is overflowing - i have it set to hold only 1 > or 2 entries). I'll probably make a test that makes every syscall > (that isn't blacklisted) block briefly. I've fixed most of those bugs i > found already, but the .so ctor surprise added a couple hours to it. =) > > barret > > -- > You received this message because you are subscribed to the Google Groups > "Akaros" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Akaros" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/d/optout.
