On Thu, Dec 11, 2025 at 6:48 AM Zbigniew Jędrzejewski-Szmek
<[email protected]> wrote:
>
> On Fri, Dec 05, 2025 at 04:12:13PM -0500, Dave Cantrell wrote:
> > On my workstation I made a copy of /usr/share/man and removed all of the 
> > symlinks in that tree.  There are 39163 man pages in that directory.  I 
> > made two copies.  The first to uncompress the pages and the second to 
> > compress them all with zstd.  Here's the storage results I gathered from 
> > 'du -s -h':
> >
> > default/             182M
> > uncompressed/        294M
> > zstd/                182M
>
> I get:
> uncompressed/     390M
> default/          187M
> zstd/             182M
>
> I used 'zstd -19' to match the max gzip compresion we're using.
>
> It seems clear that the change in size is negligible.
>
> We should also measure the time required for compression and
> decompression. I'd posit that decompression is actually more important,
> because that happens on user systems and snapiness makes users happy.
>
> zstd is clearly better here:
>
> $ time zcat man/*.gz >/dev/null
> zcat man/*.gz >/dev/null           1.65s user 0.11s system 99% cpu 1.769 total
> zcat man/*.gz >/dev/null           1.63s user 0.11s system 99% cpu 1.748 total
> zcat man/*.gz >/dev/null           1.67s user 0.12s system 99% cpu 1.792 total
> $ time zstdcat man-zstd/*.zst >/dev/null
> zstdcat man-zstd/*.zst >/dev/null  0.42s user 0.15s system 97% cpu 0.580 total
> zstdcat man-zstd/*.zst >/dev/null  0.39s user 0.15s system 99% cpu 0.545 total
> zstdcat man-zstd/*.zst >/dev/null  0.39s user 0.15s system 99% cpu 0.543 total
>
> Nevertheless, unless somebody is searching over man pages, the decompression
> time of a single page is going to be hard to see.
>

We do have graphical tools that do this sort of thing (like KDE's help
center searches and indexes man pages), and some console shells (e.g.
fish) or shell extensions (oh-my-zsh) implement similar functionality.
So making this faster for them *would* be valuable.

> For compression:
> $ time parallel gzip --best -q -k ::: *
> parallel gzip --best -q -k ::: *  57.36s user 120.09s system 197% cpu 1:29.99 
> total
> $ time parallel zstd -q -19 ::: *
> parallel zstd -q -19 ::: *       235.64s user 166.12s system 378% cpu 1:46.09 
> total
>
> Gzip comes out ahead a little bit here. (Though in both cases, the CPU doesn't
> seem to be saturated. Since the IO is negligible, I'd expect the CPUs to be
> all running at 100%. So maybe some tweaking in how the compression is invoked
> could bring this down.)  But since this happens during package build time,
> any package which has enough man pages for this to be noticeable is probably
> already taking hours to build, so this is not going to matter.
>

We'd probably want to tell zstd to compress using all available CPU
cores, which it doesn't do by default.

This can be done with "zstdmt" or "zstd -T0" (or "zstd
-T<number-of-threads>"). That may improve compression performance.


-- 
真実はいつも一つ!/ Always, there's only one truth!
-- 
_______________________________________________
devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to