On Mon, Oct 13, 2025 at 8:08 PM Arrigo Marchiori <[email protected]> wrote:

> Hello All,
>
> On Mon, Oct 13, 2025 at 09:03:27AM +0200, Arrigo Marchiori wrote:
>
> > Hello Damjan, All,
> >
> > On Mon, Oct 13, 2025 at 03:01:33AM +0000, Damjan Jovanovic wrote:
> >
> > > On Sun, Oct 12, 2025 at 8:05 PM Arrigo Marchiori <[email protected]>
> wrote:
> > >
> > > > Hello All,
> > > >
> > > > I looked into the way expat is built and used in AOO.
> > > >
> > > > TL;DR: expat functions are partially taken from our expat, and
> > > > partially from system-level expat.
> > > >
> > [...]
> > > >
> > > > Why is the system-level libexpat.so calling internal functions of our
> > > > own statically linked expat? How can we avoid it?
> > >
> > > Try running AOO as:
> > > LD_PRELOAD=/path/to/system/libexpat.so soffice.bin
> > >
> > > Does that crash?
> >
> > I will try later from home.
>
> I tried on my openSUSE Leap 15.6 system at home, and I confirm it
> crashes.  Function XML_ParserCreate_MM is resolved to the wrong .so
> library.
>
> But: if I set LD_PRELOAD=/path/to/system/libexpat.so as you suggested,
> then AOO starts and runs successfully.
>
>
Then I know what the problem is: the ELF binary format's abysmal dynamic
linking process.

In both Windows's PE binary format and MacOS's Mach-O binary format, at
link time, the binary being linked will store for each symbol, which
library that symbol is in. At run time, the symbol is only searched for in
the library that it was found in at link time. Whichever libraries load in
whichever order, symbols are only looked up where they should be.

In *nix's ELF binary format, each binary contains a list of symbols, and a
list of libraries, with no relationship between them. At run time, missing
symbols are looked up in all libraries loaded for that entire process (not
just your child libraries), in the order the libraries were loaded.

For a small application with few dependencies, this isn't too much of a
problem, as symbols aren't often duplicated between libraries. But for a
large application, especially one loading unpredictable libraries at run
time (like we do with UNO), it is quite possible for symbols to wrongly
match an unintended library, such a different version of the same library
that was loaded by someone else.

That's exactly what's happening in your case: the internal expat is
statically linked into libhelplinker.so but its symbols are exported,
and libhelplinker.so is loaded before fontconfig and system expat, so
fontconfig's expat symbols are resolved against libhelplinker.so instead of
expat. When you LD_PRELOAD the system expat, it loads first, and
supersedes libhelplinker.so's expat symbols.

If it's not possible to avoid loading both our expat and system expat, then
one of the following should fix the problem:
1. Link expat statically but hide its symbols, presumably by using a linker
map file (libhelplinker.so already uses -fvisibility=hidden, but I don't
think that's enough, expat would need to use that too).
2. Use custom ELF symbol versioning on our expat, so it can co-exist in
memory with the system expat, but our binaries will only use our expat and
other binaries will only use the system expat. This again requires a linker
map file.
3. Platform-specific hacks, like run-time dynamic linking with the
RTLD_DEEPBIND flag to dlopen() (Linux and FreeBSD), or linking with
-Bdirect on Solaris.

I believe this is one of the reasons Linux has failed on the desktop: with
ELF's disasterous symbol search algorithm, it is impossible to make
reliable software that works everywhere. Some version of some library on
some distribution will pull in some incompatible symbol -> crash. Only
monolithic bloated container solutions like Snap and Flatpak avoid that
problem, by controlling the entire filesystem, so no unexpected library can
come in.

Regards
Damjan

Reply via email to