Yeah: Here's one of the articles I read about using filesystem tricks to make a "read-write" squashfs: https://www.baeldung.com/linux/squashfs-filesystem-mount . TL;DR: An "overlay" filesystem is used to combine a writable system on top with read-only system(s) underneath so the writes happen to the top one and reads fall back to the bottom one(s).
/mnt/reports-all - Combined overlay /mnt/reports-new - Writable btrfs /mnt/reports-2025-W32 - SquashFS for a single week /mnt/reports-2025-01 - SquashFS for a single month /mnt/reports/2024 - SquashFS for a whole year Once a week passes, I can make a new weekly /mnt/reports-2025-W33 with all the reports in /mnt/reports-new. Once I have enough of those, I can use them to make a new monthly /mnt/reports-2025-07, and etc... for the yearly archive. My hope is that each of these archives are not too unwieldy to transport between systems for backups and replication and etc... And I imagine the more layers there are the more time (and CPU) it takes to find the report. So there may be some adjustments to the exact layout of the whole thing. Doug Bell d...@preaction.me > On May 8, 2025, at 6:15 PM, Scott Baker <sc...@perturb.org> wrote: > > I don't know a ton about SquashFS but some reading on Wikipedia says it's a > read-only filesystem. How would CPT use SquashFS? Storing report data? > > -- Scottchiefbaker > > On 5/8/25 9:04 AM, Doug Bell wrote: >> Yeah, a looong long time ago I was hoping Zstd compression + dictionaries >> would solve the problem. I had, though, I think, designed some >> overly-complex systems for doing it, and therefore never got around to >> setting it up. >> >> I did some tests w/ squashfs and got some good results as well. This option >> appeals to me for its transparency: The Zstd + dictionary approach means >> special tools for looking at the data, but squashfs would work w/ a standard >> CLI toolkit. Those results are below. >> >> I'm collecting (heh) up a design spec for this >> <https://github.com/orgs/cpan-testers/discussions/24> in the CPAN Testers >> Discussions under a new Proposal category. And then once we isolate this >> problem, the rest of the problems seem almost trivial ;) >> >> >> # The count of all reports >> cpantesters@cpantesters4:~$ find reports-dir/_meta/timestamp -type f | xargs >> cat | wc -l >> 44987 >> >> # The total size on-disk (I'm assuming w/ extra tail blocks) >> cpantesters@cpantesters4:~$ du -sh reports-dir/ >> 614M reports-dir/ >> >> # LZ4 squashfs >> Exportable Squashfs 4.0 filesystem, lz4 compressed, data block size 131072 >> compressed data, compressed metadata, compressed fragments, >> compressed xattrs >> duplicates are removed >> Filesystem size 127003.50 Kbytes (124.03 Mbytes) >> 30.41% of uncompressed filesystem size (417572.73 Kbytes) >> Inode table size 889805 bytes (868.95 Kbytes) >> 39.78% of uncompressed inode table size (2237004 bytes) >> Directory table size 823072 bytes (803.78 Kbytes) >> 36.71% of uncompressed directory table size (2242366 bytes) >> >> # XZ squashfs (best compression) >> Exportable Squashfs 4.0 filesystem, xz compressed, data block size 131072 >> compressed data, compressed metadata, compressed fragments, >> compressed xattrs >> duplicates are removed >> Filesystem size 92831.63 Kbytes (90.66 Mbytes) >> 22.23% of uncompressed filesystem size (417572.73 Kbytes) >> Inode table size 479692 bytes (468.45 Kbytes) >> 21.44% of uncompressed inode table size (2237004 bytes) >> Directory table size 493816 bytes (482.24 Kbytes) >> 22.02% of uncompressed directory table size (2242366 bytes) >> >> # LZO squashfs >> Exportable Squashfs 4.0 filesystem, lzo compressed, data block size 131072 >> compressed data, compressed metadata, compressed fragments, >> compressed xattrs >> duplicates are removed >> Filesystem size 119522.29 Kbytes (116.72 Mbytes) >> 28.62% of uncompressed filesystem size (417572.73 Kbytes) >> Inode table size 827963 bytes (808.56 Kbytes) >> 37.01% of uncompressed inode table size (2237004 bytes) >> Directory table size 743654 bytes (726.22 Kbytes) >> 33.16% of uncompressed directory table size (2242366 bytes) >> >> # Gzip squashfs >> Exportable Squashfs 4.0 filesystem, gzip compressed, data block size 131072 >> compressed data, compressed metadata, compressed fragments, >> compressed xattrs >> duplicates are removed >> Filesystem size 111798.37 Kbytes (109.18 Mbytes) >> 26.77% of uncompressed filesystem size (417572.73 Kbytes) >> Inode table size 627493 bytes (612.79 Kbytes) >> 28.05% of uncompressed inode table size (2237004 bytes) >> Directory table size 581621 bytes (567.99 Kbytes) >> 25.94% of uncompressed directory table size >> >> # Ztsd squashfs (needed to move to a Debian 12 box to get this) >> Exportable Squashfs 4.0 filesystem, zstd compressed >> Filesystem size 100603.81 Kbytes (98.25 Mbytes) >> 24.09% of uncompressed filesystem size (417572.73 Kbytes) >> Inode table size 537209 bytes (524.62 Kbytes) >> 24.01% of uncompressed inode table size (2237004 bytes) >> Directory table size 490852 bytes (479.35 Kbytes) >> 21.89% of uncompressed directory table size (2242366 bytes) >> >> >> >> Doug Bell >> d...@preaction.me <mailto:d...@preaction.me> >> >> >> >>> On May 5, 2025, at 6:23 PM, Scott Baker <sc...@perturb.org> >>> <mailto:sc...@perturb.org> wrote: >>> >>> CPAN Testers: >>> >>> As part of my research into Magpie we came up against a disk space hurdle. >>> Currently CPT is ingesting ~25,000 tests per day. After capturing a >>> sampling of about 40,000 tests I was able to determine that the average >>> test is 9,129 bytes of text. If we store uncompressed text that's 223MB per >>> day (81GB per year). Clearly that's not very sustainable so we need to look >>> at compression. >>> >>> gzip -9 = 3198 bytes >>> zstd -12 = 3124 bytes >>> brotli -9 = 2699 bytes >>> Brotli is the clear winner for compressing smallish chunks of text. Not >>> surprising as that was one of the primary goals when it was designed. >>> Compressing with Brotli gets us down to 66MB per day (24GB per year) which >>> is more reasonable for sure. >>> >>> Doing some research I came across Zstandard dictionaries >>> <x-msg://27/Zstandard%20dictionaries>. Zstandard dictionaries fit our use >>> case perfectly: compressing many small but very similar (json, xml, etc.) >>> files. I dumped the last 50,000 text test results from CPT and created a >>> custom 128KB dictionary file. Using that CPT tuned dictionary I was able to >>> get the average size on disk of a test result down to 1087 bytes (27MB per >>> day or 10GB per year). >>> >>> As we move forward with reworking the DB side of CPT we should definitely >>> consider Zstandard dictionaries. They are well tested, relatively easy to >>> use, and well supported >>> <https://metacpan.org/pod/Compress::Stream::Zstd::CompressionDictionary> by >>> Perl and other tools. >>> >>> High speed database-grade cloud storage is not cheap. Whatever we can do to >>> decrease the amount of raw storage we need the better. Lower storage usage >>> means faster replication and quicker backups. Have you ever tried backing >>> up 1TB of data in the cloud? Spoiler alert: it's not easy. >>> >>> -- Scottchiefbaker >>> >>> P.S. For bonus points what if we re-worked what we store? Do we need to >>> store "Thank you for uploading your work to CPAN..." Do we need to store >>> the opening boiler plate paragraph? >>> >>> >>>> From: metabase:user:314402c4-2aae-11df-837a-5e0a49663a4f >>>> Subject: NA Random-Simple-0.24 5.10.1 FreeBSD >>>> Date: 2025-03-31T17:20:02Z >>>> >>>> This distribution has been tested as part of the CPAN Testers >>>> project, supporting the Perl programming language. See >>>> http://wiki.cpantesters.org/ for more information or email >>>> questions to cpan-testers-discuss@perl.org >>>> <mailto:cpan-testers-discuss@perl.org> >>> P.P.S. Raw numbers for reference: >>> >>> >>>> perlmagpie> SELECT avg(octet_length(txt_zstd)), count(guid), grade FROM >>>> test_results INNER JOIN test USING (GUID) GROUP BY grade ORDER BY 1 asc >>>> LIMIT 30; >>>> +-----------------------+-------+---------+ >>>> | avg | count | grade | >>>> |-----------------------+-------+---------| >>>> | 837.1807610993657505 | 1892 | NA | >>>> | 862.9752690411719781 | 72015 | PASS | >>>> | 1286.9555979297194225 | 3671 | UNKNOWN | >>>> | 1515.2728811352688452 | 15362 | FAIL | >>>> +-----------------------+-------+---------+ >>>> SELECT 4 >>>> Time: 0.223s >>>> >>> >>> >>