On Thursday, July 04, 2019, at 5:03 AM, Matthew Ahrens wrote: > Your use case makes sense to me. Finally, you are the first! :)
> That said, I was able to create 100 pools on a system with 7GB RAM I can as well. The difference seems to be that these pools have data on them. And they are not new, they were created 2-3 years ago and have been heavily used since then. Apart from this, they weren't overloaded, the usage went from 0 to the current ~60% usage (so no above 80% fill). The usage pattern is write once files (the are never modified, just read or deleted). One of the pools: # zpool get all disk2 NAME PROPERTY VALUE SOURCE disk2 size 3.62T - disk2 capacity 60% - disk2 altroot - default disk2 health ONLINE - disk2 guid 16818933878072747776 default disk2 version - default disk2 bootfs - default disk2 delegation on default disk2 autoreplace off default disk2 cachefile - default disk2 failmode wait default disk2 listsnapshots off default disk2 autoexpand off default disk2 dedupditto 0 default disk2 dedupratio 1.00x - disk2 free 1.42T - disk2 allocated 2.20T - disk2 readonly off - disk2 comment - default disk2 expandsize - - disk2 freeing 0 default disk2 fragmentation 72% - disk2 leaked 0 default disk2 bootsize - default disk2 checkpoint - - disk2 feature@async_destroy enabled local disk2 feature@empty_bpobj enabled local disk2 feature@lz4_compress active local disk2 feature@multi_vdev_crash_dump enabled local disk2 feature@spacemap_histogram active local disk2 feature@enabled_txg active local disk2 feature@hole_birth active local disk2 feature@extensible_dataset active local disk2 feature@embedded_data active local disk2 feature@bookmarks enabled local disk2 feature@filesystem_limits enabled local disk2 feature@large_blocks active local disk2 feature@sha512 enabled local disk2 feature@skein enabled local disk2 feature@device_removal enabled local disk2 feature@obsolete_counts enabled local disk2 feature@zpool_checkpoint enabled local Oh, I'm not sure whether it's relevant: these filesystems were created with recordsize=1M (initially had larger files), but that was set back to the default 128k a year ago (I'm aware that this won't make potential large block disappear, the files were not rewritten). > Is it possible that the memory usage is proportional to number of filesystems (or snapshots, or something else inside the filesystem) rather than number of pools? I don't think so, because I don't have any filesystems (well except the default one which is created with the pool) or snapshots inside the zpool/zfs. > How many filesystems/snapshots do you have in each pool? Exactly one zfs and no snapshots. > How's the memory usage if you have all the filesystems/snapshots in one big > pool (striped over all your disks)? Well, that's what I can't tell for several reasons: 1. I don't currently have a machine to which I could rewrite one machine's full data (I had those, but had to turn them into production to alleviate the problems caused by this effect). 2. even if I could have one, it would take a looot of time to rewrite a hundred TiB of data BTW, I think it would work just fine. We have similar sized (mirror and raidz) pools (in the range of 50-100 TiB) with the same kind of HW and exactly the same OS and similar ZFS settings and use case, and never experienced anything like this. Well, maybe because we didn't hit a limit there, I don't know. > However, there are some aspects of limiting memory usage which are per-pool, most prominently zfs_dirty_data_max, which is typically 4GB per pool. Yeah, I already tried to experiment with these tunables, but: > You would hit this when under heavy write workloads (to each pool). You'd probably want to decrease this, or find a way to implement it globally rather than per pool. An idle system wouldn't have any dirty data, so this wouldn't come into play. you're right, dirty max would cause problems during operation, not during import... I've tried to lower (to 1/10th of the default value) everything which seemed to be memory related and could not significantly decrease the kmem usage after import. > The next step would be to figure out where the memory is being used. I'm not familiar with all the tools on FreeBSD, but can you tell if the usage is in the ARC vs the kernel heap? If the latter, breaking it down by kmem cache would help. Please see my answer to Richard: https://openzfs.topicbox.com/groups/developer/T10533b84f9e1cfc5-Mcbe0070f704b59907a38125d/using-many-zpools I'm sure that it's not ARC because of these reasons: 1. on FreeBSD top has a separate line for ARC and it remained low during the import (well nothing really reads from the pools at that time, so this must be the case) 2. I've limited ARC size with the kernel tunable 3. limiting the ARC size nearly instanteously decreased the ARC/kmem usage BTW, I have new -I very much hope valuable- details to share! During the high kmem usage case, importing all of the pools took ages. I haven't measured, but well over an hour! During the import the given disk under the pool (and it seems the zpool import -a itself is sequential, because only one disk worked at a time) worked hard. I'm not quite sure whether it wrote a lot or not, but it read a lot for sure, randomly. With each import, the kmem usage grew by 1-1.5 GiBs. No scrubs were running. Now guess what happened today! All machines were rebooted (they were dying since days, were restarted several times) and they came up quickly, the pools were mounted fast and after mounting all of them only around 7-10 GiBs kmem were in use, which is totally acceptible! Previously I thought those 1M recordsize-written blocks cause the problem (or take part of it), so I started to rewrite the files with 128k. When I started the rewrite process (with the high kmem problem in effect) any machine could die in no more than 1 minute, because of the ARC (and other stuff) growing and eating all remaining kernel memory. Now the rewrite runs since half an hour and everything is fine. We've experienced this last year, but then it also took for some days and disappeared automatically. They were working fine for around a year and then the same problem came back and disappeared (for now) again. What does ZFS do one time, which eats all kmem (and remains there after the import) which it does not later? Does anybody know about anything during import time which can cause this? Now I would also do a zpool export and import to see whether that makes a change, but now all machines work fine. What the...?! ------------------------------------------ openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/T10533b84f9e1cfc5-Mba507e84ad3760f01f5ddf2e Delivery options: https://openzfs.topicbox.com/groups/developer/subscription
