Matthias Petermann <m...@petermann-it.de> writes: > Hello, > > On 01.07.22 12:48, Brad Spencer wrote: >> "J. Hannken-Illjes" <hann...@mailbox.org> writes: >> >>>> On 1. Jul 2022, at 07:55, Matthias Petermann <m...@petermann-it.de> wrote: >>>> >>>> Good day, >>>> >>>> since some time I noticed that on several of my systems with NetBSD/amd64 >>>> 9.99.97/98 after longer usage the kernel process pgdaemon completely >>>> claims a CPU core for itself, i.e. constantly consumes 100%. >>>> The affected systems do not have a shortage of RAM and the problem does >>>> not disappear even if all workloads are stopped, and thus no RAM is >>>> actually used by application processes. >>>> >>>> I noticed this especially in connection with accesses to the ZFS set up on >>>> the respective machines - for example after checkout from the local CVS >>>> relic hosted on ZFS. >>>> >>>> Is there already a known problem or what information would have to be >>>> collected to get to the bottom of this? >>>> >>>> I currently have such a case online, so I would be happy to pull >>>> diagnostic information this evening/afternoon. At the moment all info I >>>> have is from top. >>>> >>>> Normal view: >>>> >>>> ``` >>>> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND >>>> 0 root 126 0 0K 34M CPU/0 102:45 100% 100% >>>> [system] >>>> ``` >>>> >>>> Thread view: >>>> >>>> >>>> ``` >>>> PID LID USERNAME PRI STATE TIME WCPU CPU NAME COMMAND >>>> 0 173 root 126 CPU/1 96:57 98.93% 98.93% pgdaemon [system] >>>> ``` >>> >>> Looks a lot like kern/55707: ZFS seems to trigger a lot of xcalls >>> >>> Last action proposed was to back out the patch ... >>> >>> -- >>> J. Hannken-Illjes - hann...@mailbox.org >> >> >> Probably only a slightly related data point, but Ya, if you have a >> system / VM / Xen PV that does not have a whole lot of RAM and if you >> don't back out that patch your system will become unusable in a very >> short order if you do much at all with ZFS (tested with a recent >> -current building pkgsrc packages on a Xen PVHVM). The patch does fix a >> real bug, as NetBSD doesn't have the define that it uses, but the effect >> of running that code will be needed if you use ZFS at all on a "low" RAM >> system. I personally suspect that the ZFS ARC or some pool is allowed >> to consume nearly all available "something" (pools, RAM, etc..) without >> limit but have no specific proof (or there is a leak somewhere). I >> mostly run 9.x ZFS right now (which may have other problems), and have >> been setting maxvnodes way down for some time. If I don't do that the >> Xen PV will hang itself up after a couple of 'build.sh release' runs >> when the source and build artifacts are on ZFS filesets. > > Thanks for describing this use case. Apart from the fact that I don't > currently use Xen on the affected machine, it performs similiar > workload. I use it as pbulk builder with distfiles, build artifacts and > CVS / Git mirror stored on ZFS. The builders themself are located in > chroot sandboxes on FFS. Anyway, I can trigger the observations by doing > a NetBSD src checkout from ZFS backed CVS to the FFS partition. > > The maxvnodes trick first led to pgdaemon behave normal again, but the > system freezed shortly after with no further evidence. > > I am not sure if this thread is the right one for pointing this out, but > I experienced further issues with NetBSD current and ZFS when I tried to > perform a recursive "zfs send" of a particular snapshot of my data sets. > After it initially works, I see the system freeze after a couple of > seconds with no chance to recover (could not even enter the kernel > debugger). I will come back and need to prepare a dedicated test VM for > my cases. > > Kind regards > Matthias
I saw something like that with a "zfs send..." and "zfs receive..." locking up just one time. I do that sort of thing fairly often to move filesets between one system and another and it has worked fine for me, except in one case... the destination was a NetBSD-current with a ZFS fileset set to use compression. The source is a FreeBSD with a ZFS fileset created in such a manor that NetBSD is happy with it and it also is set to use compression. No amount of messing around would let 'zfs send <foo> | ssh destination "zfs receive <foo>"' complete without locking up the destination. When I changed the destination to not use compression I was able to perform the zfs send / receive pipeline without any problems. The destination is a pretty recent -current Xen PVHVM guest and the source is a FreeBSD 12.1 (running minio to back up my Elasticsearch cluster). -- Brad Spencer - b...@anduin.eldar.org - KC8VKS - http://anduin.eldar.org