Package: tech-ctte
Severity: normal
X-Debbugs-Cc: debian-d...@lists.debian.org, debian-gtk-gnome@lists.debian.org, 
vor...@debian.org

I'm requesting advice from the tech-ctte (or anyone else with relevant
knowledge, e.g. the dpkg team or the drivers of the time64 transition)
on how to resolve glib2.0 bug #1065022. This is time-sensitive,
because it is a RC bug (temporarily breaking many applications across
this transition) and will hold up the time64 transition.

Background
==========

glib2.0 has two similar patterns where files that are managed by dpkg
are summarized in a non-dpkg-managed file, maintained by triggers and the
library postinst/postrm.

The first of these patterns is GSettings schemas. There is a tool that
loads GSettings schemas in XML format from /usr/share/glib-2.0/schemas
and aggregates them into a single binary blob in a more efficient format,
/usr/share/glib-2.0/schemas/gschemas.compiled. For performance reasons,
applications only load gschemas.compiled: there is no support for loading
the more authorable but less efficient XML files directly.

The second of these patterns is GIO modules, a plugin architecture which
loads .so files from /usr/lib/${DEB_HOST_MULTIARCH}/gio/modules
and summarizes their functionality
in /usr/lib/${DEB_HOST_MULTIARCH}/gio/modules/giomodule.cache.
Applications that want to load plugins parse giomodule.cache, and only
dlopen the plugins that provide the desired functionality (for example
an application that doesn't do any networking will not load plugins that
only implement gio-proxy-resolver).

This is implemented with dpkg file-based triggers: when a package adds
or removes GSettings schemas or GIO modules, it triggers processing by
the libglib2.0-0{,t64} postinst. The implementation has been approximately
the same shape for 10 years, and has worked well until now.

Because dpkg doesn't have an equivalent of RPM %ghost files, the two
generated summary files need to be deleted by the library's postrm.
As of bookworm (and still true in trixie), the implementation is:

- for giomodule.cache (per-architecture), the file is simply deleted by
  postrm remove

- for gschemas.compiled (shared by all architectures), if the multiarch
  refcount of the library reaches 0, then the file is deleted during the
  next postrm purge

The bug
=======

When we transition from libglib2.0-0 to libglib2.0-0t64, this involves
the removal of libglib2.0-0. In the postrm of libglib2.0-0, removing
libglib2.0-0:amd64 deletes
/usr/lib/x86_64-linux-gnu/gio/modules/giomodule.cache, and so on for
all the other architectures.

The result is that until the postinst of libglib2.0-0t64:amd64 is run,
amd64 applications will be unable to load GIO plugins, causing
functionality loss (for example, inability to use https, because the
TLS plugin is not loaded).

Similarly, either during or after the transition from libglib2.0-0
to libglib2.0-0t64, users will want to purge libglib2.0-0. In
the postrm of libglib2.0-0, if there are no multiarch
instances of libglib2.0-0 remaining, purging the package deletes
/usr/share/glib-2.0/schemas/gschemas.compiled. The result is that
applications that want to load GSettings schemas will not find their
required schemas, which is normally treated as a programming error
(incorrect installation) that causes a crash with an assertion failure.

The workaround is: after removal or purging of libglib2.0-0, reinstall
either libglib2.0-0t64 or any package that will trigger libglib2.0-0t64.
On multiarch systems, this must be done for the architecture that matches
the instance of libglib2.0-0 that was removed.

During upgrade, I am unsure what ordering guarantees we have about
the postrm of libglib2.0-0 running before or after the postinst of
libglib2.0-0t64 - perhaps we avoid the giomodule.cache bug in practice,
because the postrm runs before the postinst? But purge can happen at any
later time, so we certainly cannot guarantee that libglib2.0-0t64.postinst
will run after purging libglib2.0-0.

I apologise for not having foreseen this.

Non-solutions
=============

I am not interested in solutions that would require a use of a time
machine to change the postrm that was shipped in bookworm: bookworm was
already released, and now we are stuck with it. *After* the time-sensitive
part of this issue has been solved, I plan to look into making the postrm
robust against future transitions similar to this one by adding some way
for the new package to take over responsibility for giomodule.cache and
gschemas.compiled, but for this particular transition it's too late: the
first time at which we could rely on that functionality is trixie -> forky.

I am also not interested in solutions that require design changes in GLib,
for example adding a fallback slow-path that ignores the absence of the
summary files and loads the individual GSettings schemas and GIO modules
directly. This is because upstream would not accept such a change, and it
would introduce significant delta into Debian, which we would potentially
never be able to remove (because the removed libglib2.0-0 can be purged
at any later date). I consider the deletion of these summary files to
be a packaging problem, which we should be able to solve in packaging.

Possible solution: delete libglib2.0-0.postrm in libglib2.0-0t64.preinst
========================================================================

libglib2.0-0t64 could gain a preinst that deletes
/var/lib/dpkg/info/libglib2.0-0:${DEB_HOST_ARCH}.postrm. This is a clear
Policy violation, but perhaps between closely cooperating packages
(glib2.0 and, er, glib2.0) it would be the least-bad answer to this?

There is nothing else in the postrm other than the two problematic file
deletions (I'll have to check bookworm, but this is certainly true for
trixie) so I think there would not be any harmful side-effect of this,
other than the Policy violation.

Possible solution: revert t64 rename for glib2.0
================================================

According to Ubuntu's ABI analysis, glib2.0 only has a small number of
symbols that refer to time_t:

- g_bookmark_file_get_added
- g_bookmark_file_get_app_info
- g_bookmark_file_get_modified
- g_bookmark_file_get_visited
- g_bookmark_file_set_added
- g_bookmark_file_set_app_info
- g_bookmark_file_set_modified
- g_bookmark_file_set_visited
- g_date_set_time_t

This seems like maybe a manageable amount to handle with versioned Breaks?

g_date_set_time_t() is the only one that is not already deprecated, and
according to codesearch.debian.net, the only one that is widely used.
Its use of time_t is as an input parameter (not via a pointer), so callers
will certainly all use the correct ABI after a simple rebuild (binNMU).

The GBookMarkFile stuff is primarily used by the GtkBookmarkManager
in gtk+2.0 and gtk+3.0, plus ardour (which has a reimplementation of
GtkBookmarkManager for whatever reason) and a bunch of language bindings.

So perhaps we could transition back from libglib2.0-0t64 to libglib2.0-0,
and give it a versioned Breaks on older versions of the dependent
packages, NMU'ing the dependent packages if necessary to ensure that
we know a version number that is guaranteed to be on the "new" side of
the line?

After the time-critical part of this transition has happened, one of
the things on my extensive to-do list is looking into whether we can
change these functions upstream to give them inline wrappers, so that all
newly-compiled C/C++ code will call the wrapper and not the underlying
symbol (for example an inline version of g_bookmark_file_get_added()
would be implemented in terms of g_bookmark_file_get_added_date_time(),
which returns a non-time_t-sensitive object). However, this certainly
won't happen upstream until the GLib 2.82 cycle, which will be too late
for Ubuntu 24.04; and my to-do list is very long, so I would strongly
prefer the Debian project not to be blocking on me, personally, having
time to do this.

Possible solution: other ideas?
===============================

Perhaps someone in the technical committee or another relevant team has
a better plan?

I would like advice on how the GNOME team should proceed: one of the
possible solutions I've outlined, or some different thing.

If the solution that is chosen is a Policy violation (like deleting
the problematic postrm) then I would also like to have clarity that the
Policy violation is tolerable as a less-bad solution, and therefore will
not itself be treated as a RC bug in trixie.

If the solution that is chosen involves mass NMUs with only trivial
changes (to force the existence of a version number that is on the time64
side of the transition, so we can use it in Breaks), then I would like
permission for the GNOME team to carry out those NMUs on a 0-day basis.

Thanks,
    smcv

Reply via email to