Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

Jacob Bachmeyer Sun, 31 Mar 2024 22:34:38 -0700

Tomas Volf wrote:

On 2024-03-31 14:50:47 -0400, Eric Gallager wrote:

With a reproducible build system, multiple maintainers can "make dist"
and compare the output to cross-check for erroneous / malicious dist
environments.  Multiple signatures should be harder to compromise,
assuming each is independent and generally trustworthy.

This can only work if a package /has/ multiple active maintainers.

Well, other people besides the maintainers can also run `make dist`
and `make distcheck`. My idea was to get end-users in the habit of
running `make distcheck` themselves before installing stuff. And if
that's too much to ask of end users, I'd also point out that there are
multiple kinds of maintainer: besides the upstream maintainer, there
are also usually separate distro maintainers. Even if there's only 1
upstream maintainer, as was the case here, I still think that it would
be good to get distro maintainers in the habit of including `make
distcheck` as part of their own release process, before they accept
updates from upstream.


What would be helpful is if `make dist' would guarantee to produce the same
tarball (bit-to-bit) each time it is run, assuming the tooling is the same
version.  Currently I believe that is not the case (at least due to timestamps).

A "tardiff" tool that ignores timestamps would be a solution to thatproblem, but not to this backdoor.

Combined with GNU Guix that would allow simple way to verify that `make dist'
was used, and the resulting artifact not tampered with, even without any central
signing.

The Guix "challenge" operation would not have detected this backdoorbecause *it* *was* *in* *the* *upstream* *release*. The build serviceworks from that release tarball and you build from that same releasetarball. GNU Guix ensures an equivalent build environment and yourresults *will* match---either the backdoor was not inserted or it wasinserted in both builds.



The flow of the attack as I understand it was:

(0) (speculation on motivation) The attacker wanted a "Golden Key"to SSH and started looking for ways to backdoor sshd.(1) The attacker starts a sockpuppet campaign and manages to getone of his sockpuppets appointed co-maintainer of xz-utils.(2) [2023-06-27] The sockpuppet merges a pull request believed tobe from another sockpuppet in commitee44863ae88e377a5df10db007ba9bfadde3d314.(3) [2024-02-15] The sockpuppet "updates m4/.gitignore" to addbuild-to-host.m4 to the list in commit4323bc3e0c1e1d2037d5e670a3bf6633e8a3031e.(4) [2024-02-23] The sockpuppet adds 5 files to the xz-utilstestsuite in commit cf44e4b7f5dfdbf8c78aef377c10f71e274f63c0.(5) [2024-03-08] To cover tracks, the sockpuppet finally adds atest using bad-3-corrupt_lzma2.xz in commita3a29bbd5d86183fc7eae8f0182dace374e778d8.(6) [2024-03-08] The sockpuppet revises two of those files with alame excuse in commit a3a29bbd5d86183fc7eae8f0182dace374e778d8.

The quick analysis of the Git history supporting steps 2 - 6 above hasturned up another interesting detail: no version of configure.acactually committed ever used the gl_BUILD_TO_HOST macro. An analysisfound on pastebin noted that build-to-host.m4 is a dependency ofgettext.m4. Following up finds commit3adaddd73c8edcceaed059e859bd5262df65fc5a of 2023-02-18 in the GNUgettext repository introduced the use of gl_BUILD_TO_HOST, apparently aspart of moving some existing path translation logic to gnulib andgeneralizing it for use elsewhere. This commit is innocent (it is*extremely* unlikely that Bruno Haible was involved in the backdoorcampaign) and also explains why the backdoor was checking for "dnlConvert it to C string syntax." in m4/gettext.m4: that comment wasremoved in the same commit that switch to using gl_BUILD_TO_HOST. Thechange to gettext also occurred about a year before the sockpuppet beganto take advantage of it.

It almost "feels like" the attacker was waiting for an opportunity tomake plausible changes to autoconf macros and finally got one whenupdating the m4/ files for the 5.6.0 release. Could someone with therelease tarballs confirm that m4/gettext.m4 was updated betweenv5.5.2beta and v5.6.0? I doubt the entire backdoor was developed in theweek between those two commits. In fact, the timing around introducingifuncs suggests to me that the binary blob was at least well intodevelopment by mid-2023.

The commit message at step 2 claims that using ifuncs with-fsanitize=address causes segfaults. If this is true generally, theglibc team should probably reconsider whether the abuse potential isworth the benefit of the feature and possibly investigate how thefeature was introduced to glibc. If this was an excuse, it provided aclever way to prevent oss-fuzz from finding the backdoor, as disablingifuncs provides a conveniently hidden flag to disable the backdoor.

While double-checking the above, I stumbled across another verysuspicious commit in the repository: commite446ab7a18abfde18f8d1cf02a914df72b1370e3 by Jia Tan on 2024-02-12creating a separate "safe" range decoder mode because the next commitremoves some bounds checks.

Lastly on this topic, some of the blame for this needs to fall on thesystemd maintainers and their "katamari" architecture. There is no goodreason for notifications of daemon startup to pull in liblzma, but usinglibsystemd for that purpose does exactly that, and ended up gettingxz-utils targeted as a means of getting to sshd without the OpenSSHmaintainers noticing.

I have also done a bit more work and replicated extracting the backdoorsfrom the repository. Here are scripts to extract them:


8<------ unpack-1.sh
#!/bin/sh

# Unpack first stage backdoor script from xz-backdoored.
# To guard against other trickery, use 7-zip for decompression.

set -x

tr "\t \-_" " \t_\-" \
 < xz-backdoored/tests/files/bad-3-corrupt_lzma2.xz \
 > backdoor-1.xz
7z l -slt backdoor-1.xz
7z e -y   backdoor-1.xz

# EOF
8<------

8<------ unpack-2a.sh
#!/bin/sh

# Unpack second stage backdoor script from xz-backdoored.
# To guard against other trickery, use 7-zip for decompression where possible.
# Adapted from original backdoor-1 script.

set -x

extract () (
 set +x # very noisy
 for cycle in {1..16}; do
   (head -c +1024 >/dev/null) && head -c +2048
 done
 (head -c +1024 >/dev/null) && head -c +939
)

7z e -so xz-backdoored/tests/files/good-large_compressed.lzma \
   | extract \
   | tail -c +31233 \
   | tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377" \
   | tee backdoor-2a.lzma1 \
   | xz -F raw --lzma1 -dc \
   > backdoor-2a

# EOF
8<------

8<------ unpack-2b.sh
#!/bin/sh

# Unpack binary backdoor module from xz-backdoored.
# To guard against other trickery, use 7-zip for decompression where possible.
# Adapted from original backdoor scripts.

set -x

extract () (
 set +x # very noisy
 for cycle in {1..16}; do
   (head -c +1024 >/dev/null) && head -c +2048
 done
 (head -c +1024 >/dev/null) && head -c +939
)

7z e -so xz-backdoored/tests/files/good-large_compressed.lzma \
 | extract \
 | LC_ALL=C sed "s/\(.\)/\1\n/g" \
 | LC_ALL=C awk '
BEGIN {
   FS="\n"
   RS="\n"
   ORS=""

   m=256

   for (i=0;i<m;i++) {
        t[sprintf("x%c",i)]=i
        c[i]=((i*7)+5)%m
   }
   i=0; j=0

   for (l=0;l<8192;l++) {
        i=(i+1)%m; a=c[i]
        j=(j+a)%m; c[i]=c[j]; c[j]=a
   }
}

{
   v=t["x" (NF<1?RS:$1)]

   i=(i+1)%m; a=c[i]
   j=(j+a)%m; b=c[j]

   c[i]=b; c[j]=a

   k=c[(a+b)%m]

   printf "%c",(v+k)%m
}' \
 | tee backdoor-2b.xz \
 | xz -dc --single-stream \
 | ((head -c +0 >/dev/null 2>&1) && head -c +88664) \
 > backdoor-2b.o

# EOF
8<------

The above scripts assume that the backdoored code has been unpacked intoxz-backdoored in the current directory. As you can see by reading them,they fetch the backdoor code from the tests/files/bad-3-corrupt_lzma2.xzand tests/files/good-large_compressed.lzma files.

The unpack-1 and unpack-2a scripts yield the shell script backdoor code,while the unpack-2b script yields the binary object that was hidden inthe repository.


All backdoor-1 does is unpack and execute backdoor-2a.

Emacs makes short work (make the region span the file; C-M-\) ofindenting the backdoor-2a code, which makes the control flow clear andhelps to explain how this worked. The backdoor-2a script is run twice,once as part of configure (possibly config.status actually), whichmodifies src/liblzma/Makefile to unpack and run the backdoor-1 scriptagain using am__dist_setup, am__test_dir, and am__strip_prefix variablesto hide the commands that unpack and run backdoor-1, and once duringmake to actually unpack the backdoor. My previous conclusion that anunreasonably observant user might notice make not build the two affectedobjects is wrong---make builds them first and the backdoor scriptrebuilds them to include and call the hidden binary object. Also, itseems that config.status somehow gets deleted during the build, sincebackdoor-2a uses an if/elif/fi sequence to determine whether to altersrc/liblzma/Makefile or apply the hidden object. This is odd but doesnot impede recovering the backdoor artifacts.



-- Jacob

Re: GNU Coding Standards, automake, and the recent xz-utils backdoor

Reply via email to