On 05/04/2018 12:57 PM, Ralf Ramsauer wrote: > > > On 05/04/2018 12:38 PM, Jan Kiszka wrote: >> On 2018-05-04 11:12, Ralf Ramsauer wrote: >>> On 05/02/2018 06:18 PM, Jan Kiszka wrote: >>>> On 2018-05-02 16:51, Ralf Ramsauer wrote: >>>>> >>>>> >>>>> On 05/01/2018 06:54 PM, Jan Kiszka wrote: >>>>>> On 2018-05-01 10:54, Ralf Ramsauer wrote: >>>>>>> On 04/27/2018 08:21 PM, Jan Kiszka wrote: >>>>>>>> On 2018-04-27 11:36, Ralf Ramsauer wrote: >>>>>>>>> This won't drop symbols that are marked as used. >>>>>>>>> >>>>>>>>> The static, relocateable inmate library lib.a is created by ar. When >>>>>>>>> linking executables, unreferenced symbols may be dropped, even if they >>>>>>>>> are attributed as used. >>>>>>>>> >>>>>>>>> --whole-archive ensures that those symbols will be linked. >>>>>>>>> >>>>>>>>> Whereas on x86, we need --gc-sections. >>>>>>>> >>>>>>>> That's something I do not understand yet: With [1] we will build the >>>>>>>> whole hypervisor, including x86, with --whole-archive, and that works >>>>>>>> fine on x86 as well. >>>>>>> >>>>>>> That's what happens if I compile x86 inmates with --whole-archive >>>>>>> instead of --gc-sections: >>>>>>> >>>>>>> LD >>>>>>> /home/ralf/workspace/jailhouse/inmates/demos/x86/32-bit-demo-linked.o >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/lib32.a(ioapic-32.o): In >>>>>>> function `ioapic_init': >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/ioapic.c:48: undefined >>>>>>> reference to `map_range' >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/lib32.a(smp-32.o): In >>>>>>> function `smp_start_cpu': >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/smp.c:59: undefined >>>>>>> reference to `delay_us' >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/smp.c:61: undefined >>>>>>> reference to `delay_us' >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/lib32.a(pci-32.o): In >>>>>>> function `pci_find_device': >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/../pci.c:47: undefined >>>>>>> reference to `pci_read_config' >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/../pci.c:51: undefined >>>>>>> reference to `pci_read_config' >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/lib32.a(pci-32.o): In >>>>>>> function `pci_find_cap': >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/../pci.c:61: undefined >>>>>>> reference to `pci_read_config' >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/../pci.c:65: undefined >>>>>>> reference to `pci_read_config' >>>>>>> /home/ralf/workspace/jailhouse/inmates/lib/x86/../pci.c:68: undefined >>>>>>> reference to `pci_read_config' >>>>>>> >>>>>>> Interestingly, this only happens to the 32-bit demo inmate. >>>>>>> >>>>>> >>>>>> Because we keep everything, something might be missing now: The 32-bit >>>>>> lib does not provide support for all features that its big 64-bit >>>>>> brother has. >>>>>> >>>>>> I still think this approach is too much of a big hammer. Try >>>>>> --print-gc-sections on your specific problem (uart section loss) and >>>>>> play with --undefined as suggested by the ld man page. Not sure if there >>>>>> is also some linker script statement that can do that trick, but it >>>>>> might be worth checking. >>>>> >>>>> KEEP, together with --whole-archive does the trick: >>>>> >>>> >>>> Why still --whole-archive? We should have a clear reason here. >>> >>> Doesn't work without --whole-archive, ld still drops it then. It's the >>> combination of --whole-archive and KEEP() that makes it working for ARM >>> on the one hand, and doesn't break the 32-bit x86 demo inmate on the other. >>> >>> I guess the reason is that KEEP() only keeps symbols that enter a 'final >>> stage' of the linking process. And as those symbols are never referenced >>> anywhere, they seem to be dropped before, like they never enter that >>> stage of the process where they could be kept. >> >> Should be possible to confirm this: We have records of all stages (*.o, >> *.a). >> >>> >>> See [1], 4.6.4.4: >>> When link-time garbage collection is in use (-gc-sections), it is >>> often useful to mark sections that should not be eliminated. This is >>> accomplished by surrounding an input section's wildcard entry with >>> KEEP(), as in KEEP(*(.init)) or KEEP(SORT(*)(.ctors)). >> >> I'm wondering the following: >> >> - Is "--whole-archive --gc-sections" a reasonable combination (it >> sounds contradictory), or does this just happen to work and will break >> again with a different compiler version or some tuning elsewhere?
Looks like a dead end at the moment, but at least I understand what's going on: When linking, ld searches for the ENTRY point, and recursively looks up referenced symbols until there are no more undef'd refs. [1] So if, e.g., gic_init() is found in gic.o which is embedded in lib.a, then ld will include _all_ symbols from gic.o and later garbage collect undef'd ones. In above case, symbols in uart drivers (e.g. uart_8250.o which is also inside lib.a) are referenced nowhere, so the whole object file will _never_ be considered for linking, even if there is a KEEP() around the section. KEEP() only respects symbols that the linker has seen, but it never has seen any symbol of uart_8250.o . I can confirm this behavior by testing: If I introduce a global int foobar = 42; to uart-8250.c and reference foobar from uart-demo.c, then suddenly the .uarts section from uart-8250.c will be linked, as the object file is now considered for linking. Crud. This [2] is a similar issue, and they propose to add --whole-archive. So in our case, the --gc-sections --whole-archive combo somewhat makes in deed sense, but unfortunately, this includes (e.g.) gic_* functions to the uart-demo, where they are actually not needed. >> >> - There is no size regression due to --whole-archive for the final >> binaries, on all architectures? And now I can also answer the question how we get those size regressions on ARM: --whole-section considers all object files for linking. While unused functions will be kicked again through --gc-sections, the linker can't garbage collect (for some reason) global static functions. Anything referenced from there will of course be included... Hm. Maybe I can somehow cheat around that. Pretty sure the kernel has some similar issues somewhere, as they make heavy use of --gc-sections. Ralf [1] https://elinux.org/images/2/2d/ELC2010-gc-sections_Denys_Vlasenko.pdf [2] https://stackoverflow.com/questions/19101088/why-does-lds-keep-does-not-keep-my-symbols > > No size regression on x86, but on arm. ld seems to behave different. > Compared the objdumps on arm: inmates get in deed some really unused > stuff in their binaries, like strcmp, ... > > Hmm. Will look for a better fix. > > Thanks > Ralf > >> >>> >>> BTW: I read that __attribute__((used)) never affects the linking >>> process, it only forces the compiler to not optimize away unreferenced >>> things during compilation. >> >> That makes sense. >> >> Jan >> > -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
