On Thu, Dec 11, 2025 at 07:04:39AM -0500, Neal Gompa wrote:
> On Thu, Dec 11, 2025 at 6:48 AM Zbigniew Jędrzejewski-Szmek
> <[email protected]> wrote:
> > $ time zcat man/*.gz >/dev/null
> > zcat man/*.gz >/dev/null           1.65s user 0.11s system 99% cpu 1.769 
> > total
> > zcat man/*.gz >/dev/null           1.63s user 0.11s system 99% cpu 1.748 
> > total
> > zcat man/*.gz >/dev/null           1.67s user 0.12s system 99% cpu 1.792 
> > total
> > $ time zstdcat man-zstd/*.zst >/dev/null
> > zstdcat man-zstd/*.zst >/dev/null  0.42s user 0.15s system 97% cpu 0.580 
> > total
> > zstdcat man-zstd/*.zst >/dev/null  0.39s user 0.15s system 99% cpu 0.545 
> > total
> > zstdcat man-zstd/*.zst >/dev/null  0.39s user 0.15s system 99% cpu 0.543 
> > total
> >
> > Nevertheless, unless somebody is searching over man pages, the decompression
> > time of a single page is going to be hard to see.
> >
> 
> We do have graphical tools that do this sort of thing (like KDE's help
> center searches and indexes man pages), and some console shells (e.g.
> fish) or shell extensions (oh-my-zsh) implement similar functionality.
> So making this faster for them *would* be valuable.

That's a good point.

> > For compression:
> > $ time parallel gzip --best -q -k ::: *
> > parallel gzip --best -q -k ::: *  57.36s user 120.09s system 197% cpu 
> > 1:29.99 total
> > $ time parallel zstd -q -19 ::: *
> > parallel zstd -q -19 ::: *       235.64s user 166.12s system 378% cpu 
> > 1:46.09 total
> >
> > Gzip comes out ahead a little bit here. (Though in both cases, the CPU 
> > doesn't
> > seem to be saturated. Since the IO is negligible, I'd expect the CPUs to be
> > all running at 100%. So maybe some tweaking in how the compression is 
> > invoked
> > could bring this down.)  But since this happens during package build time,
> > any package which has enough man pages for this to be noticeable is probably
> > already taking hours to build, so this is not going to matter.
> >
> 
> We'd probably want to tell zstd to compress using all available CPU
> cores, which it doesn't do by default.
> 
> This can be done with "zstdmt" or "zstd -T0" (or "zstd
> -T<number-of-threads>"). That may improve compression performance.

I was using 'parallel' to do many compressions in parallel. I expected
that to work better… But indeed, 'zstd -T' works quite well.

$ time zstd -T8 -q -19 *
zstd -T8 -q -19 *  82.32s user 1.64s system 99% cpu 1:24.51 total
$ time zstd -T4 -q -19 *
zstd -T4 -q -19 *  93.67s user 1.57s system 99% cpu 1:35.77 total

But also in this mode, there are many periods where only a single
thread is running. I expect that this happens when we're compressing
all those tiny man pages with links and such. So I'm pretty sure that
this could be brought down significantly with efficient parallelization.

Zbyszek
-- 
_______________________________________________
devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to