Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-28 Thread James Dickens
here is what i see before prefetch_disable is set, i'm currently moving (mv
/tank/games /tank/fs1 /tank/fs2)  .5 GB and larger files from a deduped pool
to another... file copy seems fine but delete's kill performance. b130 OSOL
/dev


   0 0 0 6 0  7 0 288 1   116 zfs
0 0 0 2 0  4 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 6 0 14 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 0 0  1 0 0 0 4 24.0M zfs
0 0 0 0 0  0 0 0 0 3 16.0M zfs
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 0 0 18 0 0 0 1   116 zfs
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 1   260 zfs
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 4 24.0M zfs
0 0 0 2 0  4 0 0 0 4 24.0M zfs
0 0 0 0 0  2 0 0 0 1   116 zfs


with it enabled i see a more consistent result, but probably not any
faster.

 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
0 0 0 0 0  0 0 0 0 2 8.00M zfs
0 0 0 0 0  0 0 0 0 1   260 zfs
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 2 8.00M zfs
0 0 0 6 0  7 0 288 2 8.00M zfs
0 0 0 2 0  4 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 2 8.00M zfs
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 2 8.00M zfs
0 0 0 0 0  1 0 0 0 1   116 zfs
0 0 0 0 0  3 0 0 0 2 8.00M zfs
0 0 0 0 0  0 0 0 0 2 8.00M zfs
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
0 0 0 0 0  0 0 0 0 1   116 zfs
0 0 0 0 0  0 0 0 0 2 8.00M zfs
0 0 0 0 0  0 0 0 0 2 8.00M zfs
1 0 0 5 0  5 0 2  9.9K 1   116 zfs
0 0 0 0 0  3 0 0 0 1   116 zfs
0 0 0 4 0  7 2 0 0 2 8.00M zfs



James Dickens


On Thu, Dec 24, 2009 at 11:22 PM, Michael Herf mbh...@gmail.com wrote:

 FWIW, I just disabled prefetch, and my dedup + zfs recv seems to be
 running visibly faster (somewhere around 3-5x faster).

 echo zfs_prefetch_disable/W0t1 | mdb -kw

 Anyone else see a result like this?

 I'm using the read bandwidth from the sending pool from zpool
 iostat -x 5 to estimate transfer rate, since I assume the write rate
 would be lower when dedup is working.

 mike

 p.s. Note to set it back to the default behavior:
 echo zfs_prefetch_disable/W0t0 | mdb -kw
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-28 Thread Brent Jones
On Mon, Dec 28, 2009 at 5:46 PM, James Dickens jamesd...@gmail.com wrote:

 here is what i see before prefetch_disable is set, i'm currently moving (mv
 /tank/games /tank/fs1 /tank/fs2)  .5 GB and larger files from a deduped pool
 to another... file copy seems fine but delete's kill performance. b130 OSOL
 /dev

    0     0     0     6     0      7     0     2    88     1   116 zfs
     0     0     0     2     0      4     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     6     0     14     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     0     0      1     0     0     0     4 24.0M zfs
     0     0     0     0     0      0     0     0     0     3 16.0M zfs
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     0     0     18     0     0     0     1   116 zfs
  new  name   name  attr  attr lookup rddir  read read  write write
  file remov  chng   get   set    ops   ops   ops bytes   ops bytes
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     1   260 zfs
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     4 24.0M zfs
     0     0     0     2     0      4     0     0     0     4 24.0M zfs
     0     0     0     0     0      2     0     0     0     1   116 zfs

 with it enabled i see a more consistent result, but probably not any
 faster.
  new  name   name  attr  attr lookup rddir  read read  write write
  file remov  chng   get   set    ops   ops   ops bytes   ops bytes
     0     0     0     0     0      0     0     0     0     2 8.00M zfs
     0     0     0     0     0      0     0     0     0     1   260 zfs
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     2 8.00M zfs
     0     0     0     6     0      7     0     2    88     2 8.00M zfs
     0     0     0     2     0      4     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     2 8.00M zfs
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     2 8.00M zfs
     0     0     0     0     0      1     0     0     0     1   116 zfs
     0     0     0     0     0      3     0     0     0     2 8.00M zfs
     0     0     0     0     0      0     0     0     0     2 8.00M zfs
  new  name   name  attr  attr lookup rddir  read read  write write
  file remov  chng   get   set    ops   ops   ops bytes   ops bytes
     0     0     0     0     0      0     0     0     0     1   116 zfs
     0     0     0     0     0      0     0     0     0     2 8.00M zfs
     0     0     0     0     0      0     0     0     0     2 8.00M zfs
     1     0     0     5     0      5     0     2  9.9K     1   116 zfs
     0     0     0     0     0      3     0     0     0     1   116 zfs
     0     0     0     4     0      7     2     0     0     2 8.00M zfs


 James Dickens

 On Thu, Dec 24, 2009 at 11:22 PM, Michael Herf mbh...@gmail.com wrote:

 FWIW, I just disabled prefetch, and my dedup + zfs recv seems to be
 running visibly faster (somewhere around 3-5x faster).

 echo zfs_prefetch_disable/W0t1 | mdb -kw

 Anyone else see a result like this?

 I'm using the read bandwidth from the sending pool from zpool
 iostat -x 5 to estimate transfer rate, since I assume the write rate
 would be lower when dedup is working.

 mike

 p.s. Note to set it back to the default behavior:
 echo zfs_prefetch_disable/W0t0 | mdb -kw
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



In my own testing, the de-dupe code may not be mature enough to enter
production (hence, it being in /dev still :)
My X4540's during testing performed so terribly, I saw as low as 50
bytes/sec read/write with de-dupe enabled.
These are beasts of systems, fully loaded, but unable to even sustain
1KB/sec at times, most notably during ZFS send/recv, delete, and
destroying filesystems and snapshots.

Even with de-dupe turned off, if you had blocks that had been
de-duped, that file system will always be slow. I found I had to
completely destroy a file system once de-dupe had been enabled, then
re-create the file system to restore 

Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-24 Thread Michael Herf
FWIW, I just disabled prefetch, and my dedup + zfs recv seems to be
running visibly faster (somewhere around 3-5x faster).

echo zfs_prefetch_disable/W0t1 | mdb -kw

Anyone else see a result like this?

I'm using the read bandwidth from the sending pool from zpool
iostat -x 5 to estimate transfer rate, since I assume the write rate
would be lower when dedup is working.

mike

p.s. Note to set it back to the default behavior:
echo zfs_prefetch_disable/W0t0 | mdb -kw
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-23 Thread Markus Kovero
Hi, I threw 24GB of ram and couple latest nehalems at it and dedup=on seemed to 
cripple performance without actually using much cpu or ram. it's quite unusable 
like this.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-23 Thread Richard Elling

On Dec 23, 2009, at 7:45 AM, Markus Kovero wrote:

Hi, I threw 24GB of ram and couple latest nehalems at it and  
dedup=on seemed to cripple performance without actually using much  
cpu or ram. it's quite unusable like this.


What does the I/O look like?  Try iostat -zxnP 1 and see if there  
are a lot
of small (2-3 KB) reads.  If so, use iopattern.ksh -r to see how  
random

the reads are.

http://www.richardelling.com/Home/scripts-and-programs-1/iopattern

If you see 100% small random reads from the pool (ignore writes), then
that is the problem. Solution TBD.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-23 Thread Michael Herf
For me, arcstat.pl is a slam-dunk predictor of dedup throughput. If my
miss% is in the single digits, dedup write speeds are reasonable. When the
arc misses go way up, dedup writes get very slow. So my guess is that this
issue depends entirely on whether or not the DDT is in RAM or not. I don't
have any L2ARC.

I don't know the ARC design exactly, but I can imagine that DDT is getting
flushed out by other filesystem activity, even though keeping it in RAM is
very critical to write performance.

e.g., currently I'm doing a big chmod -R, an rsync, and a zfs send/receive
(when jobs like this take a week, it piles up.) And right now my miss% is
consistently 50% on a machine with 6GB ram. My writes are terrifically
slow, like 3MB/sec.

Can anyone comment if it is possible to tell the kernel/ARC keep more DDT
in RAM?
If not, could it be possible in a future kernel?

mike


On Wed, Dec 23, 2009 at 9:35 AM, Richard Elling richard.ell...@gmail.comwrote:

 On Dec 23, 2009, at 7:45 AM, Markus Kovero wrote:

  Hi, I threw 24GB of ram and couple latest nehalems at it and dedup=on
 seemed to cripple performance without actually using much cpu or ram. it's
 quite unusable like this.


 What does the I/O look like?  Try iostat -zxnP 1 and see if there are a
 lot
 of small (2-3 KB) reads.  If so, use iopattern.ksh -r to see how random
 the reads are.

 http://www.richardelling.com/Home/scripts-and-programs-1/iopattern

 If you see 100% small random reads from the pool (ignore writes), then
 that is the problem. Solution TBD.
  -- richard


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-23 Thread Richard Elling

On Dec 23, 2009, at 3:00 PM, Michael Herf wrote:

For me, arcstat.pl is a slam-dunk predictor of dedup throughput. If  
my miss% is in the single digits, dedup write speeds are  
reasonable. When the arc misses go way up, dedup writes get very  
slow. So my guess is that this issue depends entirely on whether or  
not the DDT is in RAM or not. I don't have any L2ARC.


Yep, seems consistent with my tests.  I'm currently seeing 43.6  
million zap-unique
entries consume approximately 12 GBytes of metadata space.  This is  
dog slow to
write on a machine with only 8 GBytes of RAM and a single HDD in the  
pool.  The
writes are relatively fast, but all of the time is spent doing random  
reads, which is

not a recipe for success with HDDs.

I don't know the ARC design exactly, but I can imagine that DDT is  
getting flushed out by other filesystem activity, even though  
keeping it in RAM is very critical to write performance.


e.g., currently I'm doing a big chmod -R, an rsync, and a zfs send/ 
receive (when jobs like this take a week, it piles up.) And right  
now my miss% is consistently 50% on a machine with 6GB ram. My  
writes are terrifically slow, like 3MB/sec.


Can anyone comment if it is possible to tell the kernel/ARC keep  
more DDT in RAM?

If not, could it be possible in a future kernel?


I'm leaning to the notion that an SSD cache device will be required  
for any sort
of dedup performance.  Or a huge amount of RAM, of course :-)   
Unfortunately,
the DDT routines are not currently instrumented (b129), so it might  
take a while

to fully understand what instrumentation would be most useful.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-21 Thread Chris Murray
In case the overhead in calculating SHA256 was the cause, I set ZFS
checksums to SHA256 at the pool level, and left for a number of days.
This worked fine.

Setting dedup=on immediately crippled performance, and then setting
dedup=off fixed things again. I did notice through a zpool iostat that
disk IO increased while dedup was on, although it didn't from the ESXi
side. Could it be that dedup tables don't fit in memory? I don't have a
great deal - 3GB. Is there a measure of how large the tables are in
bytes, rather than number of entries?

Chris

-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Chris Murray
Sent: 16 December 2009 17:19
To: Cyril Plisko; Andrey Kuzmin
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Troubleshooting dedup performance

So if the ZFS checksum is set to fletcher4 at the pool level, and
dedup=on, which checksum will it be using?

If I attempt to set dedup=fletcher4, I do indeed get this:

cannot set property for 'zp': 'dedup' must be one of 'on | off | verify
| sha256[,verify]'

Could it be that my performance troubles are due to the calculation of
two different checksums?

Thanks,
Chris

-Original Message-
From: cyril.pli...@gmail.com [mailto:cyril.pli...@gmail.com] On Behalf
Of Cyril Plisko
Sent: 16 December 2009 17:09
To: Andrey Kuzmin
Cc: Chris Murray; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Troubleshooting dedup performance

 I've set dedup to what I believe are the least resource-intensive
 settings - checksum=fletcher4 on the pool,  dedup=on rather than

 I believe checksum=fletcher4 is acceptable in dedup=verify mode only.
 What you're doing is seemingly deduplication with weak checksum w/o
 verification.

I think fletcher4 use for the deduplication purposes was disabled [1]
at all, right before build 129 cut.


[1]
http://hg.genunix.org/onnv-gate.hg/diff/93c7076216f6/usr/src/common/zfs/
zfs_prop.c


-- 
Regards,
Cyril

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-16 Thread Andrey Kuzmin
On Wed, Dec 16, 2009 at 6:41 PM, Chris Murray chrismurra...@gmail.com wrote:
 Hi,

 I run a number of virtual machines on ESXi 4, which reside in ZFS file
 systems and are accessed over NFS. I've found that if I enable dedup,
 the virtual machines immediately become unusable, hang, and whole
 datastores disappear from ESXi's view. (See the attached screenshot from
 vSphere client at around the 21:54 mark for the drop in connectivity).
 I'm on OpenSolaris Preview, build 128a.

 I've set dedup to what I believe are the least resource-intensive
 settings - checksum=fletcher4 on the pool,  dedup=on rather than

I believe checksum=fletcher4 is acceptable in dedup=verify mode only.
What you're doing is seemingly deduplication with weak checksum w/o
verification.


Regards,
Andrey

 verify, but it is still the same.

 Where can I start troubleshooting? I get the feeling that my hardware
 isn't up to the job, but some numbers to verify that would be nice
 before I start investigating an upgrade.

 vmstat showed plenty of idle CPU cycles, and zpool iostat just
 showed slow throughput, as the ESXi graph does. As soon as I set
 dedup=off, the virtual machines leapt into action again (22:15 on the
 screenshot).

 Many thanks,
 Chris

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-16 Thread Cyril Plisko
 I've set dedup to what I believe are the least resource-intensive
 settings - checksum=fletcher4 on the pool,  dedup=on rather than

 I believe checksum=fletcher4 is acceptable in dedup=verify mode only.
 What you're doing is seemingly deduplication with weak checksum w/o
 verification.

I think fletcher4 use for the deduplication purposes was disabled [1]
at all, right before build 129 cut.


[1] 
http://hg.genunix.org/onnv-gate.hg/diff/93c7076216f6/usr/src/common/zfs/zfs_prop.c


-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-16 Thread Chris Murray
So if the ZFS checksum is set to fletcher4 at the pool level, and
dedup=on, which checksum will it be using?

If I attempt to set dedup=fletcher4, I do indeed get this:

cannot set property for 'zp': 'dedup' must be one of 'on | off | verify
| sha256[,verify]'

Could it be that my performance troubles are due to the calculation of
two different checksums?

Thanks,
Chris

-Original Message-
From: cyril.pli...@gmail.com [mailto:cyril.pli...@gmail.com] On Behalf
Of Cyril Plisko
Sent: 16 December 2009 17:09
To: Andrey Kuzmin
Cc: Chris Murray; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Troubleshooting dedup performance

 I've set dedup to what I believe are the least resource-intensive
 settings - checksum=fletcher4 on the pool,  dedup=on rather than

 I believe checksum=fletcher4 is acceptable in dedup=verify mode only.
 What you're doing is seemingly deduplication with weak checksum w/o
 verification.

I think fletcher4 use for the deduplication purposes was disabled [1]
at all, right before build 129 cut.


[1]
http://hg.genunix.org/onnv-gate.hg/diff/93c7076216f6/usr/src/common/zfs/
zfs_prop.c


-- 
Regards,
Cyril

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-16 Thread Andrey Kuzmin
On Wed, Dec 16, 2009 at 8:09 PM, Cyril Plisko cyril.pli...@mountall.com wrote:
 I've set dedup to what I believe are the least resource-intensive
 settings - checksum=fletcher4 on the pool,  dedup=on rather than

 I believe checksum=fletcher4 is acceptable in dedup=verify mode only.
 What you're doing is seemingly deduplication with weak checksum w/o
 verification.

 I think fletcher4 use for the deduplication purposes was disabled [1]
 at all, right before build 129 cut.


 [1] 
 http://hg.genunix.org/onnv-gate.hg/diff/93c7076216f6/usr/src/common/zfs/zfs_prop.c

Peculiar fix, quotes the reason being checksum errors because we are
not computing the byteswapped checksum, but solves it by dropping
checksum support instead of adding byte-swapped checksum computation.
A I missing something?

Regards,
Andrey




 --
 Regards,
        Cyril

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-16 Thread Brandon High
I believe that fletcher4 was disabled for dedup in 128a. Setting dedup=on
overrides the checksum setting and forces sha256.

On Dec 16, 2009 9:10 AM, Cyril Plisko cyril.pli...@mountall.com wrote:

 I've set dedup to what I believe are the least resource-intensive 
settings - checksum=fletche...
I think fletcher4 use for the deduplication purposes was disabled [1]
at all, right before build 129 cut.


[1]
http://hg.genunix.org/onnv-gate.hg/diff/93c7076216f6/usr/src/common/zfs/zfs_prop.c


--
Regards,
Cyril

___ zfs-discuss mailing list
zfs-disc...@opensolaris.org...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss