On a system with more than 80K ZFS filesystems, we've seen cases where lwb_create() will start to fail by returning EAGAIN. The problem being, for each of those 80K ZFS filesystems, a taskq will be created for each dataset as part of the ZIL for each dataset.
For each of these taskq's, a kernel thread will be created which results in 24KB being allocated for each thread. With enough of these 24KB allocations, we eventually exhaust the memory region set aside for these allocations. Currently, segkpsize is set to a value of 2GB, which means we can only support about 80K filesystems; 2GB / 24KB = ~80K. The lwb_create() failure comes into play due to the fact that LWP creation also allocates 24KB from this same region of memory. Thus, if we've exhausted this region of memory due to the number of ZIL taskq's, there won't be any memory avaible to allow the call to lwb_create() to succeed. The solution taken by this patch is to modify the taskq used by the ZIL to be a per-pool taskq, instead of a per-dataset taskq. Additionally, to increase observability of when we fail to dispatch to the taskq from within zil_clean(), the taskq internals have also been slightly reworked to track and expose the "nomem" kstat field for non-dynamic taskqs; previously this field only existed for dynamic taskq's. This should allow one to relatively easily determine if zil_clean() is unable to successfully dispatch, which could impact the performance of spa_sync(). Upstream bugs: DLPX-53044 You can view, comment on, or merge this pull request online at: https://github.com/openzfs/openzfs/pull/439 -- Commit Summary -- * 8558 lwb_create() returns EAGAIN on system with more than 80K ZFS filesystems -- File Changes -- M usr/src/uts/common/fs/zfs/dsl_pool.c (37) M usr/src/uts/common/fs/zfs/sys/dsl_pool.h (2) M usr/src/uts/common/fs/zfs/sys/zil_impl.h (1) M usr/src/uts/common/fs/zfs/zil.c (12) M usr/src/uts/common/os/taskq.c (20) M usr/src/uts/common/sys/taskq_impl.h (3) -- Patch Links -- https://github.com/openzfs/openzfs/pull/439.patch https://github.com/openzfs/openzfs/pull/439.diff -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/439 ------------------------------------------ openzfs-developer Archives: https://openzfs.topicbox.com/groups/developer/discussions/Tfe335d18604a95f4-M6fc084f96b239ee6e57acde0 Powered by Topicbox: https://topicbox.com
