avg pointed out the rate limiting code in vm_pageout_scan() during discussion 
about PR 187594.  While it certainly can contribute to the problems discussed 
in that PR, a bigger problem is that it can allow the OOM killer to be 
triggered even though there is plenty of reclaimable memory available in the 
system.  Any load that can consume enough pages within the polling interval to 
hit the v_free_min threshold (e.g. multiple 'dd if=/dev/zero of=/file/on/zfs') 
can make this happen.

The product I’m working on does not have swap configured and treats any OOM 
trigger as fatal, so it is very obvious when this happens. :-)

I’ve tried several things to mitigate the problem.  The first was to ignore 
rate limiting for pass 2.  However, even though ZFS is guaranteed to receive 
some feedback prior to OOM being declared, my testing showed that a trivial 
load (a couple dd operations) could still consume enough of the reclaimed space 
to leave the system below its target at the end of pass 2.  After removing the 
rate limiting entirely, I’ve so far been unable to kill the system via a ZFS 
induced load.

I understand the motivation behind the rate limiting, but the current 
implementation seems too simplistic to be safe.  The documentation for the 
Solaris slab allocator provides good motivation for their approach of using a 
“sliding average” to reign in temporary bursts of usage without unduly harming 
efficient service for the recorded steady-state memory demand.  Regardless of 
the approach taken, I believe that the OOM killer must be a last resort and 
shouldn’t be called when there are caches that can be culled.

One other thing I’ve noticed in my testing with ZFS is that it needs feedback 
and a little time to react to memory pressure.  Calling it’s lowmem handler 
just once isn’t enough for it to limit in-flight writes so it can avoid reuse 
of pages that it just freed up.  But, it doesn’t take too long to react (> 1sec 
in the profiling I’ve done).  Is there a way in vm_pageout_scan() that we can 
better record that progress is being made (pages were freed in the pass, even 
if some/all of them were consumed again) and allow more passes before the OOM 
killer is invoked in this case?


freebsd-current@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to