On Wed, Feb 8, 2012 at 4:28 PM, Jeremy Chadwick <[email protected]> wrote: > On Thu, Feb 09, 2012 at 01:11:36AM +0100, Miroslav Lachman wrote: ... >> ARC Size: >> Current Size: 1769 MB (arcsize) >> Target Size (Adaptive): 512 MB (c) >> Min Size (Hard Limit): 512 MB (zfs_arc_min) >> Max Size (Hard Limit): 3584 MB (zfs_arc_max) >> >> The target size is going down to the min size and after few more >> days, the system is so slow, that I must reboot the machine. Then it >> is running fine for about 107 days and then it all repeat again. >> >> You can see more on MRTG graphs >> http://freebsd.quip.cz/ext/2012/2012-02-08-kiwi-mrtg-12-15/ >> You can see links to other useful informations on top of the page >> (arc_summary, top, dmesg, fs usage, loader.conf) >> >> There you can see nightly backups (higher CPU load started at >> 01:13), otherwise the machine is idle. >> >> It coresponds with ARC target size lowering in last 5 days >> http://freebsd.quip.cz/ext/2012/2012-02-08-kiwi-mrtg-12-15/local_zfs_arcstats_size.html >> >> And with ARC metadata cache overflowing the limit in last 5 days >> http://freebsd.quip.cz/ext/2012/2012-02-08-kiwi-mrtg-12-15/local_zfs_vfs_meta.html >> >> I don't know what's going on and I don't know if it is something >> know / fixed in newer releases. We are running a few more ZFS >> systems on 8.2 without this issue. But those systems are in >> different roles. > > This sounds like the... damn, what is it called... some kind of internal > "counter" or "ticks" thing within the ZFS code that was discovered to > only begin happening after a certain period of time (which correlated to > some number of days, possibly 107). I'm sorry that I can't be more > specific, but it's been discussed heavily on the lists in the past, and > fixes for all of that were committed to RELENG_8. I wish I could > remember the name of the function or macro or variable name it pertained > to, something like LTHAW or TLOCK or something like that. I would say > "I don't know why I can't remember", but I do know why I can't remember: > because I gave up trying to track all of these problems. > > Does someone else remember this issue? CC'ing Martin who might remember > for certain.
It's LBOLT. :-) And there was more than one related integer overflow. One of them manifested itself as L2ARC feeding thread hogging CPU time after about a month of uptime. Another one caused issue with ARC reclaim after 107 days. See more details in this thread: http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html --Artem _______________________________________________ [email protected] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[email protected]"
