Hello all, since following the releng/13.0 branch, I experience stalled disk I/O quite often (ca. once per minute) while building packages with poudriere.
What I can see in this case is the CPU going almost idle, and several processes shown in `top` in state "zfs te" (and procstat shows "zfs tear" for that). For up to several seconds, no disk I/O completes (even starting a new process is impossible), then it recovers. Only two times, I have seen the system going into a deadlock instead, with printing messages similar to this to the serial console: swap_pager: indefinite wait buffer ... I have this behavior since -RC3 (followed releng/13.0 now up to -RELEASE). Before that, I had the vnlru-related problem that was fixed with faa41af1fed350327cc542cb240ca2c6e1e8ba0c. Some details: * CPU: Intel(R) Xeon(R) CPU E3-1240L v5 @ 2.10GHz * RAM: 64GB (ECC) * Four HDDs (Seagate NAS models), 4TB each * Swap 16GB, striped over the 4 disks * Pool: 12TB raid-z on GELI-encrypted partitions. NOT upgraded yet, so I have a way back to 12.2. * Two bhyve VMs running with 1GB and 8GB RAM, both wired * Several jails running services like samba, an MTA, nginx... * Several NFS shares mounted by other machines * Poudriere running on idprio 22 with 8 parallel build jobs Reducing the parallel jobs in poudriere also reduces the frequency of the problem, but it doesn't seem to completely go away. Also, I have the impression running into these stalls is more likely when a lot of compilation jobs can be satisfied from ccache. Thanks for any ideas and insight (e.g. what this "zfs tear" status means). Best regards, Felix Palmen -- Dipl.-Inform. Felix Palmen <fe...@palmen-it.de> ,.//.......... {web} http://palmen-it.de {jabber} [see email] ,//palmen-it.de {pgp public key} http://palmen-it.de/pub.txt // """"""""""" {pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A
signature.asc
Description: PGP signature