On Sat, Sep 16, 2023 at 10:31:20AM +0530, Hideki Yamane wrote: > Today I want to propose you to change default compression format in .deb, > {data,control}.tar."xz" to ."zst".
> According to https://www.speedtest.net/global-index, broadband bandwidth > in Nicaragua becomes almost 10x > > - 2012: 1.7Mbps > - 2023: 17.4Mbps > > 10x faster than past: it means that file size is not so much problem for us That's broadband, a lot of folks have nothing but crappy 5G. I just happen to have a package converted to multiple formats on the disk because I tested/benchmarked format 0.939 vs 2.0[1]. And: -h bytes tar 5.5G 5839735844 gz 897M 939926960 xz 375M 392874208 zst 774M 811105258 For this particular package zst gives over twice as big file. You can pick a stronger compression level but at that point we're just going up the tradeoff curve. > ## More CPUs > > 2012: ThinkPad L530 has Core i5-3320M (2 cores, 4 threads) > 2023: ThinkPad L15 has Core i5-1335U (10 cores, 12 threads) > > > https://www.cpubenchmark.net/compare/817vs5294/Intel-i5-3320M-vs-Intel-i5-1335U > - i5-3320M: single 1614, multicore 2654 > - i5-1335U: single 3650, multicore 18076 points. > > And, xz cannot use multi core CPUs for decompression but zstd can. > It means that if we use xz, we just get 2x CPU power, but change to zst, > we can get more than 10x CPU power than 2012. As someone with a 64-way amd64 desktop, and a purchased-but-not-delivered 64-core riscv64 box on the way, I understand the sentiment -- but, what about parallelizing by unpacking multiple packages at the same time instead? That's safer and doesn't cost compressing ratio[2]. I've prototyped this, and even with current dpkg internals it shouldn't be hard to do (even if dpkg runs keep switching between unpacking and configuring too often). > It reduces a lot of time for package installation. There's a lot lot lot of other places in dpkg that could use a speedup, and they don't come with such a tradeoff. Especially fsync abuse: dpkg writes all of its status every. single. step., fully. flushing. it. to. persistent. storage. even. if. it's. a. dingy. SD. card. So it does for every file it unpacks; to a semi-ignorant onlooker it seems as if it uses some sort of range coder just so it can fsync between fractional bits. Even though there's no good generic way to ensure consistency of extracted payload (POSIX lacks such API, you can use btrfs snapshots), at least the dpkg state could win a lot by stopping assuming the limitations of ext2 apply to other filesystems. On ext2 a crash may do unbounded damage to the filesystem, using flat text files and fsyncs between every operation improves recoverability, but any filesystem newer than that adds better guarantees. There are so many techniques that would avoid full state rewrites... > ## More storage bandwidth > > SSD + PCIe 3/4/5 is enough, not be a blocker for decompression, now. So wishing Optane NVDIMMs didn't get cancelled... :/ On the other hand, we could switch the compression for _some_ packages. There's stuff that gets unpacked by buildds over and over. Compilers and library headers are not used much by end-users on dingy connections (and us hackers tend to prioritize computing device spending compared to regular people), thus what about switching stuff that's 1. not in build-essential but 2. in a set shared by many Build-Deps? Meow! [1]. https://lists.debian.org/debian-dpkg/2023/09/msg00014.html [2]. Parallel compression, and especially decompression, is done by flushing and dropping old state every block. -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Bestest pickup line: ⢿⡄⠘⠷⠚⠋⠀ "Cutie, your name must be Suicide, cuz I think of you every day." ⠈⠳⣄⠀⠀⠀⠀