On Monday, 18 December 2017 11:55:42 PST René J. V. Bertin wrote: > Thiago Macieira wrote: > > It doesn't, because the debug information is not loaded in the first > > place. > > When using readelf, note how the "A" flag is missing for those sections. > > So it has to skip certain, possibly considerable parts of the file while > loading it, rather than simply doing some efficient operation to copy the > whole file into memory. That should affect load times somewhat, no?
No, that's not how ELF works. First of all, the dynamic linker doesn't actually read the section table. It reads the segment table, found in the ELF program headers (readelf -l): $ readelf -l /lib/libm.so.6 Elf file type is DYN (Shared object file) Entry point 0x6200 There are 7 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00000000 0x00000000 0xf9264 0xf9264 R E 0x1000 LOAD 0x0f9eb4 0x000faeb4 0x000faeb4 0x003cc 0x003d4 RW 0x1000 DYNAMIC 0x0f9ebc 0x000faebc 0x000faebc 0x00118 0x00118 RW 0x4 NOTE 0x000114 0x00000114 0x00000114 0x00044 0x00044 R 0x4 GNU_EH_FRAME 0x0dda54 0x000dda54 0x000dda54 0x016bc 0x016bc R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10 GNU_RELRO 0x0f9eb4 0x000faeb4 0x000faeb4 0x0014c 0x0014c R 0x1 Section to Segment mapping: Segment Sections... 00 .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_d .gnu.version_r .rel.dyn .rel.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame .hash 01 .init_array .fini_array .dynamic .got .got.plt .data .bss 02 .dynamic 03 .note.gnu.build-id .note.ABI-tag 04 .eh_frame_hdr 05 06 .init_array .fini_array .dynamic .got (I've pasted libm only for column width, try it on a Qt library with debugging list yourself) Note the LOAD commands. That's what matters to the dynamic linker and what it will load. Note also how the debug sections are not in the first or second entries of the section-to-segment mapping list. That means the debugging sections are beyond the load regions and won't be present in memory. Second, file binary is loaded via mmap(), which means the actual file contents aren't faulted into memory unless needed or unless there's an madvise() system call to tell the kernel to load. So even if the debug sections included in the LOAD regions, they wouldn't occupy core memory nor would affect the load time, unless something actually tried to access them. > > One more reason to use GCC. It only builds once, even under LTO, unless > > you > > specifically ask for the fat LTO objects. > > Yet even with GCC the build times and memory requirements are larger with > LTO than without. How can it not do certain things twice? The build time has nothing to do with doing things twice. It has to do with the amount of work. Even with LTO, the compiler must start and process each translation unit. The difference between LTO and a normal build is that in the former, it needs to do less work since it doesn't actually run the optimiser. It just needs to dump some intermediary information. The difference is with the linker. In a regular build, even with -Wl,-O1, the linker does very little and its job is to basically concatenate sections of each input file. In an LTO build, the linker calls the compiler again and that will need to reload all the intermediary information and perform the optimisation, now with a much larger dataset. In my experience, a thin LTO build is actually faster (and produces better code) than an equivalent non-LTO build, but that doesn't apply to all cases. Regular, optimised (-O3 -g1) build of qmake: Time to build: 268,00s user 11,28s system 368% cpu 1:15,87 total Total object sizes (kB): 69596 Binary size (after stripping): text data bss dec hex filename 3008485 2080 6361 3016926 2e08de ../bin/qmake Simple LTO build (-O3 -g1 -flto -fno-fat-lto-objects, linking* -flto=4): Time: 208,01s user 10,36s system 365% cpu 59,731 total Total object sizes: 32476 Binary: text data bss dec hex filename 2427597 1972 6217 2435786 252aca ../bin/qmake Fat LTO build (-O3 -g1 -flto -ffat-lto-objects, linking* -flto=4): Time: 371,19s user 13,49s system 369% cpu 1:44,11 total Sizes: 101928 Binary: text data bss dec hex filename 2427597 1972 6217 2435786 252aca ../bin/qmake *: Don't forget to pass -O3 -g1 to the linker too, otherwise the LTO step won't optimise! -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center _______________________________________________ Interest mailing list Interest@qt-project.org http://lists.qt-project.org/mailman/listinfo/interest