Hello Tomas,

Wednesday, January 3, 2007, 10:32:39 AM, you wrote:

TÖ> Hello.

TÖ> Having some hangs on a snv53 machine which is quite probably ZFS+NFS
TÖ> related, since that's all the machine do ;)

TÖ> The machine is a 2x750MHz Blade1000 with 2GB ram, using a SysKonnect
TÖ> 9821 GigE card (with their 8.19.1.3 skge driver) and two HP branded MPT
TÖ> SCSI cards. Normal load is pretty much "read all you can" with misc
TÖ> tarballs and isos since it's an NFS backend to our caching http/ftp
TÖ> cluster delivering free software to the world.

TÖ> What happens is that the machine just stops responding.. it can respond
TÖ> to ping for a while (while userland is non-responsive, including
TÖ> console) but after a while, that stops too..

TÖ> Produced a panic to get a dump and tried ::memstat;
TÖ> unterweser:/scratch/070103# mdb unix.0 vmcore.0
TÖ> Loading modules: [ unix krtld genunix specfs dtrace ufs scsi_vhci pcisch
TÖ> ssd fcp fctl qlc md ip hook neti sctp arp usba s1394 nca lofs zfs random
TÖ> sd nfs ptm cpc ]
>> ::memstat
TÖ> Page Summary                Pages                MB  %Tot
TÖ> ------------     ----------------  ----------------  ----
TÖ> Kernel                     250919              1960   98%
TÖ> Anon                          888                 6    0%
TÖ> Exec and libs                 247                 1    0%
TÖ> Page cache                     38                 0    0%
TÖ> Free (cachelist)              405                 3    0%
TÖ> Free (freelist)              4370                34    2%

TÖ> Total                      256867              2006
TÖ> Physical                   253028              1976

TÖ> That doesn't seem too healthy to me.. probably something kernely eating
TÖ> up everything and the machine is just swapping to death or something..

TÖ> A dump from live kernel with mdb -k after 1.5h uptime;
TÖ> Page Summary                Pages                MB  %Tot
TÖ> ------------     ----------------  ----------------  ----
TÖ> Kernel                     212310              1658   83%
TÖ> Anon                        11307                88    4%
TÖ> Exec and libs                2418                18    1%
TÖ> Page cache                  18400               143    7%
TÖ> Free (cachelist)             4383                34    2%
TÖ> Free (freelist)              8049                62    3%


TÖ> The tweaks I have are:
TÖ> set ncsize = 500000
TÖ> set nfs:nrnode = 50
TÖ> set zfs:zil_disable=1
TÖ> set zfs:zfs_vdev_cache_bshift=14
TÖ> set zfs:zfs_vdev_cache_size=0

Comment out ncsize and reboot - should help.
I had similar behavior with increased ncsize from default on ZFS+NFS.
Mark suggested to comment it out and it just works since then.

I was also told increasing ncsize with ZFS isn't needed that much (if
at all) - however I haven't done any testing and I don't know
technical details how DNLC is used with ZFS to be sure.

-- 
Best regards,
 Robert                            mailto:[EMAIL PROTECTED]
                                       http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to