On Mon, Oct 13, 2025 at 8:08 PM Arrigo Marchiori <[email protected]> wrote:
> Hello All, > > On Mon, Oct 13, 2025 at 09:03:27AM +0200, Arrigo Marchiori wrote: > > > Hello Damjan, All, > > > > On Mon, Oct 13, 2025 at 03:01:33AM +0000, Damjan Jovanovic wrote: > > > > > On Sun, Oct 12, 2025 at 8:05 PM Arrigo Marchiori <[email protected]> > wrote: > > > > > > > Hello All, > > > > > > > > I looked into the way expat is built and used in AOO. > > > > > > > > TL;DR: expat functions are partially taken from our expat, and > > > > partially from system-level expat. > > > > > > [...] > > > > > > > > Why is the system-level libexpat.so calling internal functions of our > > > > own statically linked expat? How can we avoid it? > > > > > > Try running AOO as: > > > LD_PRELOAD=/path/to/system/libexpat.so soffice.bin > > > > > > Does that crash? > > > > I will try later from home. > > I tried on my openSUSE Leap 15.6 system at home, and I confirm it > crashes. Function XML_ParserCreate_MM is resolved to the wrong .so > library. > > But: if I set LD_PRELOAD=/path/to/system/libexpat.so as you suggested, > then AOO starts and runs successfully. > > Then I know what the problem is: the ELF binary format's abysmal dynamic linking process. In both Windows's PE binary format and MacOS's Mach-O binary format, at link time, the binary being linked will store for each symbol, which library that symbol is in. At run time, the symbol is only searched for in the library that it was found in at link time. Whichever libraries load in whichever order, symbols are only looked up where they should be. In *nix's ELF binary format, each binary contains a list of symbols, and a list of libraries, with no relationship between them. At run time, missing symbols are looked up in all libraries loaded for that entire process (not just your child libraries), in the order the libraries were loaded. For a small application with few dependencies, this isn't too much of a problem, as symbols aren't often duplicated between libraries. But for a large application, especially one loading unpredictable libraries at run time (like we do with UNO), it is quite possible for symbols to wrongly match an unintended library, such a different version of the same library that was loaded by someone else. That's exactly what's happening in your case: the internal expat is statically linked into libhelplinker.so but its symbols are exported, and libhelplinker.so is loaded before fontconfig and system expat, so fontconfig's expat symbols are resolved against libhelplinker.so instead of expat. When you LD_PRELOAD the system expat, it loads first, and supersedes libhelplinker.so's expat symbols. If it's not possible to avoid loading both our expat and system expat, then one of the following should fix the problem: 1. Link expat statically but hide its symbols, presumably by using a linker map file (libhelplinker.so already uses -fvisibility=hidden, but I don't think that's enough, expat would need to use that too). 2. Use custom ELF symbol versioning on our expat, so it can co-exist in memory with the system expat, but our binaries will only use our expat and other binaries will only use the system expat. This again requires a linker map file. 3. Platform-specific hacks, like run-time dynamic linking with the RTLD_DEEPBIND flag to dlopen() (Linux and FreeBSD), or linking with -Bdirect on Solaris. I believe this is one of the reasons Linux has failed on the desktop: with ELF's disasterous symbol search algorithm, it is impossible to make reliable software that works everywhere. Some version of some library on some distribution will pull in some incompatible symbol -> crash. Only monolithic bloated container solutions like Snap and Flatpak avoid that problem, by controlling the entire filesystem, so no unexpected library can come in. Regards Damjan
