Re: [zfs-discuss] Snapshots impact on performance
Same problem here (snv_60). Robert, did you find any solutions? gino check this http://www.opensolaris.org/jive/thread.jspa?threadID=34423tstart=0 Check spa_sync function time remember to change POOL_NAME ! dtrace -q -n fbt::spa_sync:entry'/(char *)(((spa_t*)arg0)-spa_name) == POOL_NAME/{ self-t = timestamp; }' -n fbt::spa_sync:return'/self-t/{ @m = max((timestamp - self-t)/100); self-t = 0; }' -n tick-10m'{ printa([EMAIL PROTECTED],@m); exit(0); }' If you have long spa_sync times, try to check if you have problems with finding new blocks in space map with this script: #!/usr/sbin/dtrace -s fbt::space_map_alloc:entry { self-s = arg1; } fbt::space_map_alloc:return /arg1 != -1/ { self-s = 0; } fbt::space_map_alloc:return /self-s (arg1 == -1)/ { @s = quantize(self-s); self-s = 0; } tick-10s { printa(@s); } Then change zfs set recordsize=XX POOL_NAME. Make sure that all filesystem inherits recordsize. #zfs get -r recordsize POOL_NAME Other thing is space map size. check map size echo '::spa' | mdb -k | grep 'f[0-9]*-[0-9]*' \ | while read pool_ptr state pool_name do echo ${pool_ptr}::walk metaslab|::print -d struct metaslab ms_smo.smo_objsize \ | mdb -k \ | nawk '{sub(^0t,,$3);sum+=$3}END{print sum}' done The value you will get is space map size on disk. In memory space map will have about 4 *size_on_disk. Sometimes during snapshot remove kernel will have to load all space maps to memory. For example if space map on disk takes 1GB then: - kernel in spa_sync funtion will read 1GB from disk ( or from cache ) - allocate 4GB for avl trees - do all operations on avl trees - save maps It is good to have enough free memory for this operations. You can reduce space map by coping all filesystems on other pool. I recommend zfs send. regards Lukas This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshots impact on performance
Hello Victor, Wednesday, June 27, 2007, 1:19:44 PM, you wrote: VL Gino wrote: Same problem here (snv_60). Robert, did you find any solutions? VL Couple of week ago I put together an implementation of space maps which VL completely eliminates loops and recursion from space map alloc VL operation, and allows to implement different allocation strategies quite VL easily (of which I put together 3 more). It looks like it works for me VL on thumper and my notebook with ZFS Root though I have almost no time to VL test it more these days due to year end. I haven't done SPARC build yet VL and I do not have test case to test against. VL Also, it comes at a price - I have to spend some more time (logarithmic, VL though) during all other operations on space maps and is not optimized now. Lukasz (cc) - maybe you can test it and even help on tuning it? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshots impact on performance
Robert Milkowski wrote: If it happens again I'll try to get some more specific data - however it depends on when it happens as during peak hours I'll probably just destroy a snapshot to get it working. If it happens again, it would be great if you could gather some data before you destroy the snapshot so we have some chance of figuring out what's going on here. 'iostat -xnpc 1' will tell us if it's CPU or disk bound. 'lockstat -kgiw sleep 10' will tell us what functions are using CPU. 'echo ::walk thread|::findstack | mdb -k' will tell us where threads are stuck. Actually, if you could gather each of those both while you're observing the problem, and then after the problem goes away, that would be helpful. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshots impact on performance
Matthew Ahrens wrote On 10/16/06 09:07,: Robert Milkowski wrote: Hello zfs-discuss, S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is created and 7 copies are kept. There's quota set for a file system however there's always at least 50GB of free space in a file system (and much more in a pool). ZFS file system is exported over NFS. Snapshots consume about 280GB of space. We have noticed so performance problems on nfs clients to this file system even during times with smaller load. Rising quota didn't help. However removing oldest snapshot automatically solved performance problems. I do not have more details - sorry. Is it expected for snapshots to have very noticeable performance impact on file system being snapshoted? No, this behavior is unexpected. The only way that snapshots should have a performance impact on access to the filesystem is if you are running low on space in the pool or quota (which it sounds like you are not). Can you describe what the performance problems were? What was the workload like? What problem did you identify? How did it improve when you 'zfs destroy'-ed the oldest snapshot? Are you sure that the oldest snapshot wasn't pushing you close to your quota? --matt I could well believe there would be a hiccup when the snapshot is taken on the rest of the pool. Each snapshot calls txg_wait_synced four times. A few related to the zil and one from dsl_sync_task_group_wait ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss