[OpenZFS Developer] Anbormal cache stats and system behaviour

Pavlo Tue, 15 Oct 2013 08:23:37 -0700

Hello. 

I decided to ask this question here, since FreeBSD community keeps staying 
quite about our issue.


This is our story... 

Formerly we were using 10.0-CURRENT FreeBSD 10.0-CURRENT #3: Mon Jan 21 
14:48:41 EET 2013 

        vfs.zfs.version.ioctl                   3 
        vfs.zfs.version.acl                     1 
        vfs.zfs.version.spa                     5000 
        vfs.zfs.version.zpl                     5 


Lately we have switched to 10.0-ALPHA6 FreeBSD 10.0-ALPHA6 #2 r256309 
        vfs.zfs.version.ioctl                   3 
        vfs.zfs.version.acl                     1 
        vfs.zfs.version.spa                     5000 
        vfs.zfs.version.zpl                     5 

Also we have upgraded our hardware, but I trully believe this has nothing to to 
with abnormality we have faced. 

The issue manifests itself in abnormal system load (CPU) while HDDs are 
relatively idle. 
We're running file storage service, so before the bottleneck was always related 
to HDDs. 

After reboot, system works perfectly, but then slowly load grows up until there 
is almost no IO while load average is 30+, and all processes hand in 
read()/write()/open()/close()/unlink()/rename(). This is mostly happens after 
24 hour or so. 

Is suspect, time is related to how much RAM is installed. 
I saw similar issue report on FreeBSD mailing list. 
They have 512G RAM and their time limit, until system behaves abnormally is ~96 
hours. 
We have 128G RAM (previously we had 128G RAM as well). 

So my question is, what can you say, analyzing these strange stats: 

[server]# vmstat -z
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
........ skipped....
NCLNODE:                528,      0,       0,       0,       0,   0,   0
space_seg_cache:         64,      0,  289198,  299554,25932081,25932081,   0
zio_cache:              944,      0,   37512,   50124,1638254119,1638254119,   0
zio_link_cache:          48,      0,   50955,   38104,1306418638,1306418638,   0
sa_cache:                80,      0,   63694,      56,  198643,198643,   0
dnode_t:                864,      0,  128813,       3,  184863,184863,   0
dmu_buf_impl_t:         224,      0, 1610024,  314631,157119686,157119686,   0
arc_buf_hdr_t:          216,      0,82949975,   56107,156352659,156352659,   0
arc_buf_t:               72,      0, 1586866,  314374,158076670,158076670,   0
zil_lwb_cache:          192,      0,    6354,    7526, 2486242,2486242,   0
zfs_znode_cache:        368,      0,   63694,      16,  198643,198643,   0
...... skipped ...... 

On the previous setup we had whole FAIL column consisting only of 0s. 

Thank you!

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

[OpenZFS Developer] Anbormal cache stats and system behaviour

Reply via email to