On Fri, Dec 05, 2025 at 04:12:13PM -0500, Dave Cantrell wrote:
> On my workstation I made a copy of /usr/share/man and removed all of the 
> symlinks in that tree.  There are 39163 man pages in that directory.  I made 
> two copies.  The first to uncompress the pages and the second to compress 
> them all with zstd.  Here's the storage results I gathered from 'du -s -h':
> 
> default/             182M
> uncompressed/        294M
> zstd/                182M

I get:
uncompressed/     390M  
default/          187M
zstd/             182M

I used 'zstd -19' to match the max gzip compresion we're using.

It seems clear that the change in size is negligible.

We should also measure the time required for compression and
decompression. I'd posit that decompression is actually more important,
because that happens on user systems and snapiness makes users happy.

zstd is clearly better here:

$ time zcat man/*.gz >/dev/null
zcat man/*.gz >/dev/null           1.65s user 0.11s system 99% cpu 1.769 total
zcat man/*.gz >/dev/null           1.63s user 0.11s system 99% cpu 1.748 total
zcat man/*.gz >/dev/null           1.67s user 0.12s system 99% cpu 1.792 total
$ time zstdcat man-zstd/*.zst >/dev/null
zstdcat man-zstd/*.zst >/dev/null  0.42s user 0.15s system 97% cpu 0.580 total
zstdcat man-zstd/*.zst >/dev/null  0.39s user 0.15s system 99% cpu 0.545 total
zstdcat man-zstd/*.zst >/dev/null  0.39s user 0.15s system 99% cpu 0.543 total

Nevertheless, unless somebody is searching over man pages, the decompression
time of a single page is going to be hard to see.

For compression:
$ time parallel gzip --best -q -k ::: *
parallel gzip --best -q -k ::: *  57.36s user 120.09s system 197% cpu 1:29.99 
total
$ time parallel zstd -q -19 ::: *      
parallel zstd -q -19 ::: *       235.64s user 166.12s system 378% cpu 1:46.09 
total

Gzip comes out ahead a little bit here. (Though in both cases, the CPU doesn't
seem to be saturated. Since the IO is negligible, I'd expect the CPUs to be
all running at 100%. So maybe some tweaking in how the compression is invoked
could bring this down.)  But since this happens during package build time,
any package which has enough man pages for this to be noticeable is probably
already taking hours to build, so this is not going to matter.

Zbyszek


(*) One gotcha:
there's a bunch of files which have .gz suffixes but are not compressed:

$ file *.gz
alt-java.1.gz:    Java source, ASCII text
jar.1.gz:         troff or preprocessor input, ASCII text
jarsigner.1.gz:   troff or preprocessor input, ASCII text
java.1.gz:        troff or preprocessor input, ASCII text
javac.1.gz:       troff or preprocessor input, ASCII text
javadoc.1.gz:     troff or preprocessor input, ASCII text
javap.1.gz:       troff or preprocessor input, ASCII text
jcmd.1.gz:        troff or preprocessor input, ASCII text
jconsole.1.gz:    troff or preprocessor input, ASCII text
jdb.1.gz:         troff or preprocessor input, ASCII text
jdeps.1.gz:       troff or preprocessor input, ASCII text
jinfo.1.gz:       troff or preprocessor input, ASCII text
jmap.1.gz:        troff or preprocessor input, ASCII text
jpackage.1.gz:    troff or preprocessor input, ASCII text
jps.1.gz:         troff or preprocessor input, ASCII text
jrunscript.1.gz:  JavaScript source, ASCII text
jstack.1.gz:      troff or preprocessor input, ASCII text
jstat.1.gz:       troff or preprocessor input, ASCII text
jstatd.1.gz:      troff or preprocessor input, ASCII text
jwebserver.1.gz:  troff or preprocessor input, ASCII text
keytool.1.gz:     troff or preprocessor input, ASCII text
rmiregistry.1.gz: troff or preprocessor input, ASCII text
serialver.1.gz:   troff or preprocessor input, ASCII text

I excluded those from the size checks.
-- 
_______________________________________________
devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to