Hi all,
the xz-backdoor (CVE-2024-3094) luckily did not target gentoo, but it could
have easily done so. One step in this sophisticated attack involved injecting
concealed code into the build-process by some kind of homebrew steganography.
I asked myself how many high-entropy files I can find in distfiles. All these
gif|png|jpg|jpeg|wav|der|xz|gz|p12 might actually be low entropy, but checking
this would require a more sophisticated approach — in a naive approach, I just
checked how much bzip2 is able to compress files.
But I also found some really unnecessary and — IMHO — high risk stuff in
distfiles. tpm-tools f.e. has the /.git/ subdir with all those blobs. Python
has some audio-testfiles.
In an ideal world, upstream would instead include some low entropy generators
for this stuff. Gentoo should address the problem even if upstream is not
responsive.
I wonder if we should have some functionality in eclasses to
a) let src_unpack() filter/drop distfile content, controlled by an
ebuild-variable (to deal f.e. with /.git/)
b) let src_unpack() warn on high entropy content (except files whitelisted in
ebuild)
This would at least allow to easily identify high risk stuff that warrants more
scrutiny.
Greets,
Andreas
BTW, this is my naive test script, sort output on -r -k3
#!/bin/bash
TMPDIR=/tmp/distfiles-entropy.$(date +"%Y%m%d%H%M%S")
trap ' rm -rf ${TMPDIR} ' EXIT
mkdir ${TMPDIR}
cd ${TMPDIR}
for DISTFILE in $(find /var/cache/distfiles/ -type f -printf '%f\n')
do
mkdir ${DISTFILE}
case ${DISTFILE} in
*.tar.gz) gzip -dc /var/cache/distfiles/${DISTFILE} | tar
-C ${TMPDIR}/${DISTFILE} -xf -;;
*.tgz) gzip -dc /var/cache/distfiles/${DISTFILE} | tar
-C ${TMPDIR}/${DISTFILE} -xf -;;
*.tar.xz) xzcat /var/cache/distfiles/${DISTFILE} | tar
-C ${TMPDIR}/${DISTFILE} -xf -;;
*.txz) xzcat /var/cache/distfiles/${DISTFILE} | tar
-C ${TMPDIR}/${DISTFILE} -xf -;;
*.tar.bz2) bzcat /var/cache/distfiles/${DISTFILE} | tar
-C ${TMPDIR}/${DISTFILE} -xf -;;
*.tbz) bzcat /var/cache/distfiles/${DISTFILE} | tar
-C ${TMPDIR}/${DISTFILE} -xf -;;
*.gz) gzip -dc /var/cache/distfiles/${DISTFILE} >
${TMPDIR}/${DISTFILE}/file;;
*) cat /var/cache/distfiles/${DISTFILE} >
${TMPDIR}/${DISTFILE}/file;;
esac
find ${DISTFILE} -type f | xargs bzip2 -cv 2>&1 >/dev/null
rm -rf ${TMPDIR}/${DISTFILE}
done