On 01/25/2013 10:29 AM, Marin Atanasov Nikolov wrote:
Hello again :)

Here's my update on these spontaneous reboots after less than a week since
I've updated to stable/9.

First two days the system was running fine with no reboots happening, so I
though that this update actually fixed it, but I was wrong.

Not really a solution but you can take a look at sysutils/zfs-stats


The reboots are still happening and still no clear evidence of the root
cause. What I did so far:

* Ran disks tests -- looking good
* Ran memtest -- looking good
* Replaced power cables
* Ran UPS tests -- looking good
* Checked for any bad capacitors -- none found
* Removed all ZFS snapshots

There is also one more machine connected to the same UPS, so if it was a
UPS issue I'd expect that the other one reboots too, but that's not the
case.

Now that I've excluded the hardware part of this problem I started looking
again into the software side, and this time in particular -- ZFS.

I'm running FreeBSD 9.1-STABLE #1 r245686 on a Intel i5 with 8Gb of memory.

A quick look at top(1) showed lots of memory usage by ARC and my available
free memory dropping fast. I've made a screenshot, which you can see on the
link below:

* http://users.unix-heaven.org/~dnaeon/top-zfs-arc.jpg

So I went to the FreeBSD Wiki and started reading the ZFS Tuning Guide [1],
but honestly at the end I was not sure which parameters I need to
increase/decrease and to what values.

Here's some info about my current parameters.

     % sysctl vm.kmem_size_max
     vm.kmem_size_max: 329853485875

     % sysctl vm.kmem_size
     vm.kmem_size: 8279539712

     % sysctl vfs.zfs.arc_max
     vfs.zfs.arc_max: 7205797888

     % sysctl kern.maxvnodes
     kern.maxvnodes: 206227

There's one script at the ZFSTuningGuide which calculates kernel memory
utilization, and for me these values are listed below:

     TEXT=22402749, 21.3649 MB
     DATA=4896264192, 4669.44 MB
     TOTAL=4918666941, 4690.81 MB

While looking for ZFS tuning I've also stumbled upon this thread in the
FreeBSD Forums [2], where the OP describes a similar behaviour to what I am
already experiencing, so I'm quite worried now that the reason for these
crashes is ZFS.

Before jumping into any change to the kernel parameters (vm.kmem_size,
vm.kmem_max_size, kern.maxvnodes, vfs.zfs.arc_max) I'd like to hear any
feedback from people that have already done such optimizations on their ZFS
systems.

Could you please share what are the optimal values for these parameters on
a system with 8Gb of memory? Is there a way to calculate these values or is
it just a "test-and-see-which-fits-better" way of doing this?

Thanks and regards,
Marin

[1]: https://wiki.freebsd.org/ZFSTuningGuide
[2]: http://forums.freebsd.org/showthread.php?t=9143


On Sun, Jan 20, 2013 at 3:44 PM, Marin Atanasov Nikolov <[email protected]>wrote:



On Sat, Jan 19, 2013 at 10:19 PM, John <[email protected]> wrote:

At 03:00am I can see that periodic(8) runs, but I don't see what could
have
taken so much of the free memory. I'm also running this system on ZFS and
have daily rotating ZFS snapshots created - currently the number of ZFS
snapshots are > 1000, and not sure if that could be causing this. Here's
a
list of the periodic(8) daily scripts that run at 03:00am time.

% ls -1 /etc/periodic/daily
800.scrub-zfs

% ls -1 /usr/local/etc/periodic/daily
402.zfSnap
403.zfSnap_delete
On a couple of my zfs machines, I've found running a scrub along with
other
high file system users to be a problem.  I therefore run scrub from cron
and
schedule it so it doesn't overlap with periodic.

I also found on a machine with an i3 and 4G ram that overlapping scrubs
and
snapshot destroy would cause the machine to grind to the point of being
non-responsive. This was not a problem when the machine was new, but
became one
as the pool got larger (dedup is off and the pool is at 45% capacity).

I use my own zfs management script and it prevents snapshot destroys from
overlapping scrubs, and with a lockfile it prevents a new destroy from
being
initiated when an old one is still running.

zfSnap has its -S switch to prevent actions during a scrub which you
should
use if you haven't already.


Hi John,

Thanks for the hints. It was a long time since I've setup zfSnap and I've
just checked the configuration and I am using the "-s -S" flags, so there
should be no overlapping.

Meanwhile I've updated to 9.1-RELEASE, but then I hit an issue when trying
to reboot the system (which appears to be discussed a lot in a separate
thread).

Then I've updated to stable/9, so at the least the reboot issue is now
solved. Since I've to stable/9 I'm monitoring the system's memory usage and
so far it's been pretty stable, so I'll keep an eye of an update to
stable/9 has actually fixed this strange issue.

Thanks again,
Marin


Since making these changes, a machine that would have to be rebooted
several
times a week has now been up 61 days.

John Theus
TheUs Group



--
Marin Atanasov Nikolov

dnaeon AT gmail DOT com
http://www.unix-heaven.org/





_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Reply via email to