Re: [zfs-discuss] Troubleshooting dedup performance
here is what i see before prefetch_disable is set, i'm currently moving (mv /tank/games /tank/fs1 /tank/fs2) .5 GB and larger files from a deduped pool to another... file copy seems fine but delete's kill performance. b130 OSOL /dev 0 0 0 6 0 7 0 288 1 116 zfs 0 0 0 2 0 4 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 6 0 14 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 1 0 0 0 4 24.0M zfs 0 0 0 0 0 0 0 0 0 3 16.0M zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 18 0 0 0 1 116 zfs new name name attr attr lookup rddir read read write write file remov chng get setops ops ops bytes ops bytes 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 260 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 4 24.0M zfs 0 0 0 2 0 4 0 0 0 4 24.0M zfs 0 0 0 0 0 2 0 0 0 1 116 zfs with it enabled i see a more consistent result, but probably not any faster. new name name attr attr lookup rddir read read write write file remov chng get setops ops ops bytes ops bytes 0 0 0 0 0 0 0 0 0 2 8.00M zfs 0 0 0 0 0 0 0 0 0 1 260 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs 0 0 0 6 0 7 0 288 2 8.00M zfs 0 0 0 2 0 4 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs 0 0 0 0 0 1 0 0 0 1 116 zfs 0 0 0 0 0 3 0 0 0 2 8.00M zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs new name name attr attr lookup rddir read read write write file remov chng get setops ops ops bytes ops bytes 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs 1 0 0 5 0 5 0 2 9.9K 1 116 zfs 0 0 0 0 0 3 0 0 0 1 116 zfs 0 0 0 4 0 7 2 0 0 2 8.00M zfs James Dickens On Thu, Dec 24, 2009 at 11:22 PM, Michael Herf mbh...@gmail.com wrote: FWIW, I just disabled prefetch, and my dedup + zfs recv seems to be running visibly faster (somewhere around 3-5x faster). echo zfs_prefetch_disable/W0t1 | mdb -kw Anyone else see a result like this? I'm using the read bandwidth from the sending pool from zpool iostat -x 5 to estimate transfer rate, since I assume the write rate would be lower when dedup is working. mike p.s. Note to set it back to the default behavior: echo zfs_prefetch_disable/W0t0 | mdb -kw ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting dedup performance
On Mon, Dec 28, 2009 at 5:46 PM, James Dickens jamesd...@gmail.com wrote: here is what i see before prefetch_disable is set, i'm currently moving (mv /tank/games /tank/fs1 /tank/fs2) .5 GB and larger files from a deduped pool to another... file copy seems fine but delete's kill performance. b130 OSOL /dev 0 0 0 6 0 7 0 2 88 1 116 zfs 0 0 0 2 0 4 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 6 0 14 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 1 0 0 0 4 24.0M zfs 0 0 0 0 0 0 0 0 0 3 16.0M zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 18 0 0 0 1 116 zfs new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 260 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 4 24.0M zfs 0 0 0 2 0 4 0 0 0 4 24.0M zfs 0 0 0 0 0 2 0 0 0 1 116 zfs with it enabled i see a more consistent result, but probably not any faster. new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 0 0 0 0 0 0 2 8.00M zfs 0 0 0 0 0 0 0 0 0 1 260 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs 0 0 0 6 0 7 0 2 88 2 8.00M zfs 0 0 0 2 0 4 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs 0 0 0 0 0 1 0 0 0 1 116 zfs 0 0 0 0 0 3 0 0 0 2 8.00M zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 0 0 0 0 0 0 1 116 zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs 0 0 0 0 0 0 0 0 0 2 8.00M zfs 1 0 0 5 0 5 0 2 9.9K 1 116 zfs 0 0 0 0 0 3 0 0 0 1 116 zfs 0 0 0 4 0 7 2 0 0 2 8.00M zfs James Dickens On Thu, Dec 24, 2009 at 11:22 PM, Michael Herf mbh...@gmail.com wrote: FWIW, I just disabled prefetch, and my dedup + zfs recv seems to be running visibly faster (somewhere around 3-5x faster). echo zfs_prefetch_disable/W0t1 | mdb -kw Anyone else see a result like this? I'm using the read bandwidth from the sending pool from zpool iostat -x 5 to estimate transfer rate, since I assume the write rate would be lower when dedup is working. mike p.s. Note to set it back to the default behavior: echo zfs_prefetch_disable/W0t0 | mdb -kw ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss In my own testing, the de-dupe code may not be mature enough to enter production (hence, it being in /dev still :) My X4540's during testing performed so terribly, I saw as low as 50 bytes/sec read/write with de-dupe enabled. These are beasts of systems, fully loaded, but unable to even sustain 1KB/sec at times, most notably during ZFS send/recv, delete, and destroying filesystems and snapshots. Even with de-dupe turned off, if you had blocks that had been de-duped, that file system will always be slow. I found I had to completely destroy a file system once de-dupe had been enabled, then re-create the file system to restore
Re: [zfs-discuss] Troubleshooting dedup performance
FWIW, I just disabled prefetch, and my dedup + zfs recv seems to be running visibly faster (somewhere around 3-5x faster). echo zfs_prefetch_disable/W0t1 | mdb -kw Anyone else see a result like this? I'm using the read bandwidth from the sending pool from zpool iostat -x 5 to estimate transfer rate, since I assume the write rate would be lower when dedup is working. mike p.s. Note to set it back to the default behavior: echo zfs_prefetch_disable/W0t0 | mdb -kw ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting dedup performance
Hi, I threw 24GB of ram and couple latest nehalems at it and dedup=on seemed to cripple performance without actually using much cpu or ram. it's quite unusable like this. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting dedup performance
On Dec 23, 2009, at 7:45 AM, Markus Kovero wrote: Hi, I threw 24GB of ram and couple latest nehalems at it and dedup=on seemed to cripple performance without actually using much cpu or ram. it's quite unusable like this. What does the I/O look like? Try iostat -zxnP 1 and see if there are a lot of small (2-3 KB) reads. If so, use iopattern.ksh -r to see how random the reads are. http://www.richardelling.com/Home/scripts-and-programs-1/iopattern If you see 100% small random reads from the pool (ignore writes), then that is the problem. Solution TBD. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting dedup performance
For me, arcstat.pl is a slam-dunk predictor of dedup throughput. If my miss% is in the single digits, dedup write speeds are reasonable. When the arc misses go way up, dedup writes get very slow. So my guess is that this issue depends entirely on whether or not the DDT is in RAM or not. I don't have any L2ARC. I don't know the ARC design exactly, but I can imagine that DDT is getting flushed out by other filesystem activity, even though keeping it in RAM is very critical to write performance. e.g., currently I'm doing a big chmod -R, an rsync, and a zfs send/receive (when jobs like this take a week, it piles up.) And right now my miss% is consistently 50% on a machine with 6GB ram. My writes are terrifically slow, like 3MB/sec. Can anyone comment if it is possible to tell the kernel/ARC keep more DDT in RAM? If not, could it be possible in a future kernel? mike On Wed, Dec 23, 2009 at 9:35 AM, Richard Elling richard.ell...@gmail.comwrote: On Dec 23, 2009, at 7:45 AM, Markus Kovero wrote: Hi, I threw 24GB of ram and couple latest nehalems at it and dedup=on seemed to cripple performance without actually using much cpu or ram. it's quite unusable like this. What does the I/O look like? Try iostat -zxnP 1 and see if there are a lot of small (2-3 KB) reads. If so, use iopattern.ksh -r to see how random the reads are. http://www.richardelling.com/Home/scripts-and-programs-1/iopattern If you see 100% small random reads from the pool (ignore writes), then that is the problem. Solution TBD. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting dedup performance
On Dec 23, 2009, at 3:00 PM, Michael Herf wrote: For me, arcstat.pl is a slam-dunk predictor of dedup throughput. If my miss% is in the single digits, dedup write speeds are reasonable. When the arc misses go way up, dedup writes get very slow. So my guess is that this issue depends entirely on whether or not the DDT is in RAM or not. I don't have any L2ARC. Yep, seems consistent with my tests. I'm currently seeing 43.6 million zap-unique entries consume approximately 12 GBytes of metadata space. This is dog slow to write on a machine with only 8 GBytes of RAM and a single HDD in the pool. The writes are relatively fast, but all of the time is spent doing random reads, which is not a recipe for success with HDDs. I don't know the ARC design exactly, but I can imagine that DDT is getting flushed out by other filesystem activity, even though keeping it in RAM is very critical to write performance. e.g., currently I'm doing a big chmod -R, an rsync, and a zfs send/ receive (when jobs like this take a week, it piles up.) And right now my miss% is consistently 50% on a machine with 6GB ram. My writes are terrifically slow, like 3MB/sec. Can anyone comment if it is possible to tell the kernel/ARC keep more DDT in RAM? If not, could it be possible in a future kernel? I'm leaning to the notion that an SSD cache device will be required for any sort of dedup performance. Or a huge amount of RAM, of course :-) Unfortunately, the DDT routines are not currently instrumented (b129), so it might take a while to fully understand what instrumentation would be most useful. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting dedup performance
In case the overhead in calculating SHA256 was the cause, I set ZFS checksums to SHA256 at the pool level, and left for a number of days. This worked fine. Setting dedup=on immediately crippled performance, and then setting dedup=off fixed things again. I did notice through a zpool iostat that disk IO increased while dedup was on, although it didn't from the ESXi side. Could it be that dedup tables don't fit in memory? I don't have a great deal - 3GB. Is there a measure of how large the tables are in bytes, rather than number of entries? Chris -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Chris Murray Sent: 16 December 2009 17:19 To: Cyril Plisko; Andrey Kuzmin Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Troubleshooting dedup performance So if the ZFS checksum is set to fletcher4 at the pool level, and dedup=on, which checksum will it be using? If I attempt to set dedup=fletcher4, I do indeed get this: cannot set property for 'zp': 'dedup' must be one of 'on | off | verify | sha256[,verify]' Could it be that my performance troubles are due to the calculation of two different checksums? Thanks, Chris -Original Message- From: cyril.pli...@gmail.com [mailto:cyril.pli...@gmail.com] On Behalf Of Cyril Plisko Sent: 16 December 2009 17:09 To: Andrey Kuzmin Cc: Chris Murray; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Troubleshooting dedup performance I've set dedup to what I believe are the least resource-intensive settings - checksum=fletcher4 on the pool, dedup=on rather than I believe checksum=fletcher4 is acceptable in dedup=verify mode only. What you're doing is seemingly deduplication with weak checksum w/o verification. I think fletcher4 use for the deduplication purposes was disabled [1] at all, right before build 129 cut. [1] http://hg.genunix.org/onnv-gate.hg/diff/93c7076216f6/usr/src/common/zfs/ zfs_prop.c -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting dedup performance
On Wed, Dec 16, 2009 at 6:41 PM, Chris Murray chrismurra...@gmail.com wrote: Hi, I run a number of virtual machines on ESXi 4, which reside in ZFS file systems and are accessed over NFS. I've found that if I enable dedup, the virtual machines immediately become unusable, hang, and whole datastores disappear from ESXi's view. (See the attached screenshot from vSphere client at around the 21:54 mark for the drop in connectivity). I'm on OpenSolaris Preview, build 128a. I've set dedup to what I believe are the least resource-intensive settings - checksum=fletcher4 on the pool, dedup=on rather than I believe checksum=fletcher4 is acceptable in dedup=verify mode only. What you're doing is seemingly deduplication with weak checksum w/o verification. Regards, Andrey verify, but it is still the same. Where can I start troubleshooting? I get the feeling that my hardware isn't up to the job, but some numbers to verify that would be nice before I start investigating an upgrade. vmstat showed plenty of idle CPU cycles, and zpool iostat just showed slow throughput, as the ESXi graph does. As soon as I set dedup=off, the virtual machines leapt into action again (22:15 on the screenshot). Many thanks, Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting dedup performance
I've set dedup to what I believe are the least resource-intensive settings - checksum=fletcher4 on the pool, dedup=on rather than I believe checksum=fletcher4 is acceptable in dedup=verify mode only. What you're doing is seemingly deduplication with weak checksum w/o verification. I think fletcher4 use for the deduplication purposes was disabled [1] at all, right before build 129 cut. [1] http://hg.genunix.org/onnv-gate.hg/diff/93c7076216f6/usr/src/common/zfs/zfs_prop.c -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting dedup performance
So if the ZFS checksum is set to fletcher4 at the pool level, and dedup=on, which checksum will it be using? If I attempt to set dedup=fletcher4, I do indeed get this: cannot set property for 'zp': 'dedup' must be one of 'on | off | verify | sha256[,verify]' Could it be that my performance troubles are due to the calculation of two different checksums? Thanks, Chris -Original Message- From: cyril.pli...@gmail.com [mailto:cyril.pli...@gmail.com] On Behalf Of Cyril Plisko Sent: 16 December 2009 17:09 To: Andrey Kuzmin Cc: Chris Murray; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Troubleshooting dedup performance I've set dedup to what I believe are the least resource-intensive settings - checksum=fletcher4 on the pool, dedup=on rather than I believe checksum=fletcher4 is acceptable in dedup=verify mode only. What you're doing is seemingly deduplication with weak checksum w/o verification. I think fletcher4 use for the deduplication purposes was disabled [1] at all, right before build 129 cut. [1] http://hg.genunix.org/onnv-gate.hg/diff/93c7076216f6/usr/src/common/zfs/ zfs_prop.c -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting dedup performance
On Wed, Dec 16, 2009 at 8:09 PM, Cyril Plisko cyril.pli...@mountall.com wrote: I've set dedup to what I believe are the least resource-intensive settings - checksum=fletcher4 on the pool, dedup=on rather than I believe checksum=fletcher4 is acceptable in dedup=verify mode only. What you're doing is seemingly deduplication with weak checksum w/o verification. I think fletcher4 use for the deduplication purposes was disabled [1] at all, right before build 129 cut. [1] http://hg.genunix.org/onnv-gate.hg/diff/93c7076216f6/usr/src/common/zfs/zfs_prop.c Peculiar fix, quotes the reason being checksum errors because we are not computing the byteswapped checksum, but solves it by dropping checksum support instead of adding byte-swapped checksum computation. A I missing something? Regards, Andrey -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting dedup performance
I believe that fletcher4 was disabled for dedup in 128a. Setting dedup=on overrides the checksum setting and forces sha256. On Dec 16, 2009 9:10 AM, Cyril Plisko cyril.pli...@mountall.com wrote: I've set dedup to what I believe are the least resource-intensive settings - checksum=fletche... I think fletcher4 use for the deduplication purposes was disabled [1] at all, right before build 129 cut. [1] http://hg.genunix.org/onnv-gate.hg/diff/93c7076216f6/usr/src/common/zfs/zfs_prop.c -- Regards, Cyril ___ zfs-discuss mailing list zfs-disc...@opensolaris.org... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss