Hello,
As Christopher Baines reported on IRC, calling ‘fdatasync’ on each
cached narinfo under /var/guix/substitute/cache can be unreasonably
expensive on spinning disks.
On bayfront, when tracing ‘guix substitute’ processes, we observe
‘fdatasync’ occasionally taking more than a second, and typically taking
more than 0.2s:
--8<---------------cut here---------------start------------->8---
mkdir("/var", 0777) = -1 EEXIST (File exists) <0.000048>
mkdir("/var/guix", 0777) = -1 EEXIST (File exists) <0.000042>
mkdir("/var/guix/substitute", 0777) = -1 EEXIST (File exists) <0.000040>
mkdir("/var/guix/substitute/cache", 0777) = -1 EEXIST (File exists) <0.000042>
mkdir("/var/guix/substitute/cache/kzwjeblndsbkjzmjailrt4bnhguil7tqjmewzcyw22hgajbhfy3q",
0777) = -1 EEXIST (File exists
) <0.000041>
clock_gettime(CLOCK_MONOTONIC, {tv_sec=153237, tv_nsec=595042938}) = 0
<0.000037>
openat(AT_FDCWD,
"/var/guix/substitute/cache/kzwjeblndsbkjzmjailrt4bnhguil7tqjmewzcyw22hgajbhfy3q/nfmn2nkfjz5kzklw165aj
6ilwphw9zrz.3kAGMM", O_RDWR|O_CREAT|O_EXCL, 0600) = 15 <0.000123>
fstat(15, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 <0.000037>
lseek(15, 0, SEEK_CUR) = 0 <0.000038>
write(15, "(narinfo (version 2) (cache-uri "..., 1066) = 1066 <0.000071>
fdatasync(15) = 0 <1.860604>
--8<---------------cut here---------------end--------------->8---
A longer sample:
--8<---------------cut here---------------start------------->8---
ludo@bayfront ~$ grep ^fdatasync log.strace
fdatasync(15) = 0 <1.860604>
fdatasync(15) = 0 <1.101868>
fdatasync(15) = 0 <0.464357>
fdatasync(15) = 0 <0.363363>
fdatasync(15) = 0 <0.307759>
fdatasync(15) = 0 <0.619979>
fdatasync(15) = 0 <0.269880>
fdatasync(15) = 0 <0.274093>
fdatasync(15) = 0 <0.247207>
fdatasync(15) = 0 <0.367507>
fdatasync(15) = 0 <0.212277>
fdatasync(15) = 0 <0.234929>
fdatasync(15) = 0 <0.330697>
fdatasync(15) = 0 <0.193925>
fdatasync(15) = 0 <0.251554>
fdatasync(15) = 0 <0.379571>
--8<---------------cut here---------------end--------------->8---
That call comes from ‘with-atomic-file-output’, commit
1752a17a1e6f7138892eeeb4806cd04ccb3ca1b0, which refers to this
discussion:
https://lists.gnu.org/archive/html/guix-devel/2016-06/msg00456.html
It probably makes sense in some uses of ‘with-atomic-file-output’, but
nor for a cache as is the case here.
Ludo’.