Today we discovered a few more things and discussed them on IRC. Here’s a summary.
/var/cache sits on the same storage as /gnu. We mounted the 5TB ext4 file system that’s hosted by the SAN at /mnt_test and started copying over /var/cache to /mnt_test/var/cache. Transfer speed was considerably faster (not *great*, but reasonably fast) than the copy of /gnu/store/trash to the same target. This confirmed our suspicions that the problem is not with the storage array but due to the fact that /gnu/store/trash (and also /gnu/store) is an extremely large, flat directory. /var/cache is not. Here’s what we do now: continue copying /var/cache to the SAN, then remount to serve substitutes from there. This removes some pressure from the file system as it will only be used for /gnu. We’re considering to dump the file system completely (i.e. reinstall the server), thereby emptying /gnu, but leaving the stash of built substitutes in /var/cache (hosted from the faster SAN). We could take this opportunity to reformat /gnu with btrfs, which performs quite a bit more poorly than ext4 but would be immune to defragmentation. It’s not clear that defragmentation matters here. It could just be that the problem is exclusively caused by having these incredibly large, flat /gnu/store, /gnu/store/.links, and /gnu/store/trash directories. A possible alternative for this file system might also be XFS, which performs well when presented with unreasonably large directories. It may be a good idea to come up with realistic test scenarios that we could test with each of these three file systems at scale. Any ideas? -- Ricardo
