On Fri, Dec 05, 2025 at 04:12:13PM -0500, Dave Cantrell wrote: > On my workstation I made a copy of /usr/share/man and removed all of the > symlinks in that tree. There are 39163 man pages in that directory. I made > two copies. The first to uncompress the pages and the second to compress > them all with zstd. Here's the storage results I gathered from 'du -s -h': > > default/ 182M > uncompressed/ 294M > zstd/ 182M
I get: uncompressed/ 390M default/ 187M zstd/ 182M I used 'zstd -19' to match the max gzip compresion we're using. It seems clear that the change in size is negligible. We should also measure the time required for compression and decompression. I'd posit that decompression is actually more important, because that happens on user systems and snapiness makes users happy. zstd is clearly better here: $ time zcat man/*.gz >/dev/null zcat man/*.gz >/dev/null 1.65s user 0.11s system 99% cpu 1.769 total zcat man/*.gz >/dev/null 1.63s user 0.11s system 99% cpu 1.748 total zcat man/*.gz >/dev/null 1.67s user 0.12s system 99% cpu 1.792 total $ time zstdcat man-zstd/*.zst >/dev/null zstdcat man-zstd/*.zst >/dev/null 0.42s user 0.15s system 97% cpu 0.580 total zstdcat man-zstd/*.zst >/dev/null 0.39s user 0.15s system 99% cpu 0.545 total zstdcat man-zstd/*.zst >/dev/null 0.39s user 0.15s system 99% cpu 0.543 total Nevertheless, unless somebody is searching over man pages, the decompression time of a single page is going to be hard to see. For compression: $ time parallel gzip --best -q -k ::: * parallel gzip --best -q -k ::: * 57.36s user 120.09s system 197% cpu 1:29.99 total $ time parallel zstd -q -19 ::: * parallel zstd -q -19 ::: * 235.64s user 166.12s system 378% cpu 1:46.09 total Gzip comes out ahead a little bit here. (Though in both cases, the CPU doesn't seem to be saturated. Since the IO is negligible, I'd expect the CPUs to be all running at 100%. So maybe some tweaking in how the compression is invoked could bring this down.) But since this happens during package build time, any package which has enough man pages for this to be noticeable is probably already taking hours to build, so this is not going to matter. Zbyszek (*) One gotcha: there's a bunch of files which have .gz suffixes but are not compressed: $ file *.gz alt-java.1.gz: Java source, ASCII text jar.1.gz: troff or preprocessor input, ASCII text jarsigner.1.gz: troff or preprocessor input, ASCII text java.1.gz: troff or preprocessor input, ASCII text javac.1.gz: troff or preprocessor input, ASCII text javadoc.1.gz: troff or preprocessor input, ASCII text javap.1.gz: troff or preprocessor input, ASCII text jcmd.1.gz: troff or preprocessor input, ASCII text jconsole.1.gz: troff or preprocessor input, ASCII text jdb.1.gz: troff or preprocessor input, ASCII text jdeps.1.gz: troff or preprocessor input, ASCII text jinfo.1.gz: troff or preprocessor input, ASCII text jmap.1.gz: troff or preprocessor input, ASCII text jpackage.1.gz: troff or preprocessor input, ASCII text jps.1.gz: troff or preprocessor input, ASCII text jrunscript.1.gz: JavaScript source, ASCII text jstack.1.gz: troff or preprocessor input, ASCII text jstat.1.gz: troff or preprocessor input, ASCII text jstatd.1.gz: troff or preprocessor input, ASCII text jwebserver.1.gz: troff or preprocessor input, ASCII text keytool.1.gz: troff or preprocessor input, ASCII text rmiregistry.1.gz: troff or preprocessor input, ASCII text serialver.1.gz: troff or preprocessor input, ASCII text I excluded those from the size checks. -- _______________________________________________ devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/[email protected] Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
