bug#51787: Disk performance on ci.guix.gnu.org

Ricardo Wurmus Tue, 21 Dec 2021 09:37:19 -0800

Today we discovered a few more things and discussed them on IRC.  Here’s
a summary.


/var/cache sits on the same storage as /gnu.  We mounted the 5TB ext4
file system that’s hosted by the SAN at /mnt_test and started copying
over /var/cache to /mnt_test/var/cache.  Transfer speed was considerably
faster (not *great*, but reasonably fast) than the copy of
/gnu/store/trash to the same target.

This confirmed our suspicions that the problem is not with the storage
array but due to the fact that /gnu/store/trash (and also /gnu/store)
is an extremely large, flat directory.  /var/cache is not.

Here’s what we do now: continue copying /var/cache to the SAN, then
remount to serve substitutes from there.  This removes some pressure
from the file system as it will only be used for /gnu.  We’re
considering to dump the file system completely (i.e. reinstall the
server), thereby emptying /gnu, but leaving the stash of built
substitutes in /var/cache (hosted from the faster SAN).

We could take this opportunity to reformat /gnu with btrfs, which
performs quite a bit more poorly than ext4 but would be immune to
defragmentation.  It’s not clear that defragmentation matters here.  It
could just be that the problem is exclusively caused by having these
incredibly large, flat /gnu/store, /gnu/store/.links, and
/gnu/store/trash directories.

A possible alternative for this file system might also be XFS, which
performs well when presented with unreasonably large directories.

It may be a good idea to come up with realistic test scenarios that we
could test with each of these three file systems at scale.

Any ideas?

-- 
Ricardo

bug#51787: Disk performance on ci.guix.gnu.org

Reply via email to