Re: [zfs-discuss] zfs hanging during reads

2009-12-17 Thread Eric D. Mudama

On Wed, Dec 16 at 22:41, Tim wrote:

hmm, not seeing the same slow down when I boot from the Samsung EStool CD and 
run a diag which performs a surface scan...
could this still be a hardware issue, or possibly something with the Solaris 
data format on the disk?


Rotating drives often have various optimizations to help recover from
damaged servo sectors when reading sequentially, in that they can skip
over bad areas and just assume that the position information is
there, until they get an ECC fatal on a read.  Until the drive wanders
off-track, it just keeps reading until it eventually finds some
position information.

I'm guessing you have a physical problem with the servo wedges on that
drive that only manifests itself in some of your access methods.  Does
the drive click or make any other noises when this is happening?

For the price of drives today, I'd buy a replacement and look at
swapping that one out.  You can always keep it as a spare for later.


--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] force 4k writes

2009-12-17 Thread Eric D. Mudama

On Wed, Dec 16 at  7:35, Bill Sprouse wrote:
The question behind the question is, given the really bad things that 
can happen performance-wise with writes that are not 4k aligned when 
using flash devices, is there any way to insure that any and all 
writes from ZFS are 4k aligned?


Some flash devices can handle this better than others, often several
orders of magnitude better.  Not all devices (as you imply) are
so-affected.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DeDup and Compression - Reverse Order?

2009-12-17 Thread Andrey Kuzmin
Downside you have described happens only when the same checksum is
used for data protection and duplicate detection. This implies sha256,
BTW, since fletcher-based dedupe has been dropped in recent builds.

On 12/17/09, Kjetil Torgrim Homme kjeti...@linpro.no wrote:
 Andrey Kuzmin andrey.v.kuz...@gmail.com writes:
 Darren J Moffat wrote:
 Andrey Kuzmin wrote:
 Resilvering has noting to do with sha256: one could resilver long
 before dedupe was introduced in zfs.

 SHA256 isn't just used for dedup it is available as one of the
 checksum algorithms right back to pool version 1 that integrated in
 build 27.

 'One of' is the key word. And thanks for code pointers, I'll take a
 look.

 I didn't mention sha256 at all :-).  the reasoning is the same no matter
 what hash algorithm you're using (fletcher2, fletcher4 or sha256.  dedup
 doesn't require sha256 either, you can use fletcher4.

 the question was: why does data have to be compressed before it can be
 recognised as a duplicate?  it does seem like a waste of CPU, no?  I
 attempted to show the downsides to identifying blocks by their
 uncompressed hash.  (BTW, it doesn't affect storage efficiency, the same
 duplicate blocks will be discovered either way.)

 --
 Kjetil T. Homme
 Redpill Linpro AS - Changing the game

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



-- 
Regards,
Andrey
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] force 4k writes

2009-12-17 Thread Colin Raven
On Thu, Dec 17, 2009 at 09:14, Eric D. Mudama edmud...@bounceswoosh.orgwrote:

 On Wed, Dec 16 at  7:35, Bill Sprouse wrote:

 The question behind the question is, given the really bad things that can
 happen performance-wise with writes that are not 4k aligned when using flash
 devices, is there any way to insure that any and all writes from ZFS are 4k
 aligned?


 Some flash devices can handle this better than others, often several
 orders of magnitude better.  Not all devices (as you imply) are
 so-affected.


Is there - somewhere - a list of flash devices, with some (perhaps
subjective) indication of how they handle issues like this?

--
-Me
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DeDup and Compression - Reverse Order?

2009-12-17 Thread Kjetil Torgrim Homme
Andrey Kuzmin andrey.v.kuz...@gmail.com writes:

 Downside you have described happens only when the same checksum is
 used for data protection and duplicate detection. This implies sha256,
 BTW, since fletcher-based dedupe has been dropped in recent builds.

if the hash used for dedup is completely separate from the hash used for
data protection, I don't see any downsides to computing the dedup hash
from uncompressed data.  why isn't it?

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DeDup and Compression - Reverse Order?

2009-12-17 Thread Darren J Moffat

Kjetil Torgrim Homme wrote:

Andrey Kuzmin andrey.v.kuz...@gmail.com writes:


Downside you have described happens only when the same checksum is
used for data protection and duplicate detection. This implies sha256,
BTW, since fletcher-based dedupe has been dropped in recent builds.


if the hash used for dedup is completely separate from the hash used for
data protection, I don't see any downsides to computing the dedup hash
from uncompressed data.  why isn't it?


It isn't separate because that isn't how Jeff and Bill designed it.  I 
think the design the have is great.


Instead of trying to pick holes in the theory can you demonstrate a real 
performance problem with compression=on and dedup=on and show that it is 
because of the compression step ?


Otherwise if you want it changed code it up and show how what you have 
done is better in all cases.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs zend is very slow

2009-12-17 Thread Edward Ned Harvey
 I'm willing to accept slower writes with compression enabled, par for
 the course. Local writes, even with compression enabled, can still
 exceed 500MB/sec, with moderate to high CPU usage.
 These problems seem to have manifested after snv_128, and seemingly
 only affect ZFS receive speeds. Local pool performance is still very
 fast.

Now we're getting somewhere.  ;-)
You've tested the source disk (result: fast.)
You've tested the destination disk without zfs receive (result: fast.)
Now the only two ingredients left are:

Ssh performance, or zfs receive performance.

So, to conclusively identify and prove and measure that zfs receive is the
problem, how about this:
zfs send somefilessytem | ssh somehost 'cat  /dev/null'

If that goes slow, then ssh is the culprit.
If that goes fast ... and then you change to zfs receive and that goes
slow ... Now you've scientifically shown that zfs receive is slow.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compress an existing filesystem

2009-12-17 Thread Edward Ned Harvey
 Hi all,
   I need to move a filesystem off of one host and onto another
 smaller
 one.  The fs in question, with no compression enabled, is using 1.2 TB
 (refer).  I'm hoping that zfs compression will dramatically reduce this
 requirement and allow me to keep the dataset on an 800 GB store.  Does
 this sound feasible?  Can I achieve the move to the new box using zfs
 send/receive?  If so, how do I do it?  Do I turn on compression on the
 target host just after I begin the zfs send/receive?

How much your data compresses, is dependent on what type of data you have.
JPG files won't compress at all, neither will any other already-compressed
file format.  But if you have a gene sequence file, it'll compress 10:1
because it's so repetitive internally.  For typical filesystems, about 70%
of original size is a good estimate.

Now:  If your data will fit into the 800gb drive, it will be tight.  And you
SHOULD NOT do that.

For one, performance of that drive will be horrible, at best.  

And I've heard a trend of horror stories, that zfs has a tendency to implode
when it's very full.  So try to keep your disks below 90%.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DeDup and Compression - Reverse Order?

2009-12-17 Thread Kjetil Torgrim Homme
Darren J Moffat darr...@opensolaris.org writes:
 Kjetil Torgrim Homme wrote:
 Andrey Kuzmin andrey.v.kuz...@gmail.com writes:

 Downside you have described happens only when the same checksum is
 used for data protection and duplicate detection. This implies sha256,
 BTW, since fletcher-based dedupe has been dropped in recent builds.

 if the hash used for dedup is completely separate from the hash used
 for data protection, I don't see any downsides to computing the dedup
 hash from uncompressed data.  why isn't it?

 It isn't separate because that isn't how Jeff and Bill designed it.

thanks for confirming that, Darren.

 I think the design the have is great.

I don't disagree.

 Instead of trying to pick holes in the theory can you demonstrate a
 real performance problem with compression=on and dedup=on and show
 that it is because of the compression step ?

compression requires CPU, actually quite a lot of it.  even with the
lean and mean lzjb, you will get not much more than 150 MB/s per core or
something like that.  so, if you're copying a 10 GB image file, it will
take a minute or two, just to compress the data so that the hash can be
computed so that the duplicate block can be identified.  if the dedup
hash was based on uncompressed data, the copy would be limited by
hashing efficiency (and dedup tree lookup).

I don't know how tightly interwoven the dedup hash tree and the block
pointer hash tree are, or if it is all possible to disentangle them.

conceptually it doesn't seem impossible, but that's easy for me to
say, with no knowledge of the zio pipeline...

oh, how does encryption play into this?  just don't?  knowing that
someone else has the same block as you is leaking information, but that
may be acceptable -- just make different pools for people you don't
trust.

 Otherwise if you want it changed code it up and show how what you have
 done is better in all cases.

I wish I could :-)

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Upgrading a volume from iscsitgt to COMSTAR

2009-12-17 Thread Stephen Green
Hi, I have a zfs volume that's exported via iscsi for my wife's Mac to 
use for Time Machine.


I've just built a new machine to house my big pool, and installed 
build 129 on it.  I'd like to start using COMSTAR for exporting the 
iscsi targets, rather than the older iscsi infrastructure.


I've seen quite a few tutorials on how to use COMSTAR for new volumes 
(and a few mentions of the shareiscsi=stmf).  I've seen some talk about 
how the old infrastructure used to use the first 64K of the volume for 
the iscsi information (and that COMSTAR uses the ZFS metadata store.)


What I haven't found is a set of steps for taking a volume from the old 
way of doing things to the new.  There are hints (e.g., here: 
https://opensolaris.org/jive/thread.jspa?threadID=115078), but no 
concrete set of steps.


Despite the ease of use of ZFS, I presume that it's not as simple as saying:

zfs set shareiscsi=stmf volume

because a) it's not clear to me that that setting for shareiscsi will do 
the same magic as shareiscsi=on used to do, and b) there's that 
initial-64K-problem which I assume will make the Mac throw a wobbly when 
trying to mount the file system that's on the volume.


Any advice on how to do this?  There's plenty of room to create a new 
volume and dd over the data (suggestions for the skip parameter to dd 
welcome, though!) if that's the only way.


Once I figure this out, I'll be happy to write it up for my blog, which 
can then be pointed to when this comes up again.


Thanks in advance,

Steve
--
Stephen Green  //   stephen.gr...@sun.com
Principal Investigator \\   http://blogs.sun.com/searchguy
The AURA Project   //   Voice: +1 781-442-0926
Sun Microsystems Labs  \\   Fax:   +1 781-442-0399
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DeDup and Compression - Reverse Order?

2009-12-17 Thread Darren J Moffat

Kjetil Torgrim Homme wrote:


I don't know how tightly interwoven the dedup hash tree and the block
pointer hash tree are, or if it is all possible to disentangle them.


At the moment I'd say very interwoven by desgin.


conceptually it doesn't seem impossible, but that's easy for me to
say, with no knowledge of the zio pipeline...


Correct it isn't impossible but instead there would probably need to be 
two checksums held, one of the untransformed data (ie uncompressed and 
unencrypted) and one of the transformed data (compressed and encrypted). 
 That has different tradeoffs and SHA256 can be expensive too see:


http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via

Note also that the compress/encrypt/checksum and the dedup are separate 
pipeline stages so while dedup is happening for block N block N+1 can be 
getting transformed - so this is designed to take advantage of multiple 
scheduling units (threads,cpus,cores etc).



oh, how does encryption play into this?  just don't?  knowing that
someone else has the same block as you is leaking information, but that
may be acceptable -- just make different pools for people you don't
trust.


compress, encrypt, checksum, dedup.

You are correct that it is an information leak but only within a dataset 
and its clones and only if you can observe the deduplication stats (and 
you need to use zdb to get enough info to see the leak - and that means 
you have access to the raw devices), the deupratio isn't really enough 
unless the pool is really idle or has only one user writing at a time.


For the encryption case deduplication of the same plaintext block will 
only work with in a dataset or a clone of it - because only in those 
cases do you have the same key (and the way I have implemented the IV 
generation for AES CCM/GCM mode ensures that the same plaintext will 
have the same IV so the ciphertexts will match).  Also if you place a 
block in an unencrypted dataset that happens to match the ciphertext in 
an encrypted dataset they won't dedup either (you need to understand 
what I've done with the AES CCM/GCM MAC and the zio_chksum_t field in 
the blkptr_t and how that is used by dedup to see why).


If that small information leak isn't acceptable even within the dataset 
then don't enable both encryption and deduplication on those datasets - 
and don't delegate that property to your users either.  Or you can 
frequently rekey your per dataset data encryption keys 'zfs key -K' but 
then you might as well turn dedup off - other there are some very good 
usecases in multi level security where doing dedup/encryption and rekey 
provides a nice effect.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DeDup and Compression - Reverse Order?

2009-12-17 Thread Bob Friesenhahn

On Thu, 17 Dec 2009, Kjetil Torgrim Homme wrote:


compression requires CPU, actually quite a lot of it.  even with the
lean and mean lzjb, you will get not much more than 150 MB/s per core or
something like that.  so, if you're copying a 10 GB image file, it will
take a minute or two, just to compress the data so that the hash can be
computed so that the duplicate block can be identified.  if the dedup
hash was based on uncompressed data, the copy would be limited by
hashing efficiency (and dedup tree lookup).


It is useful to keep in mind that dedupication can save a lot of disk 
space but it is usually only quite effective in certain circumstances, 
such as when replicating a collection of files.  The majority of write 
I/O will never benefit from deduplication.  Based on this, 
speculatively assuming that the data will not be deduplicated does not 
result in increased cost most of the time.  If the data does end up 
being deduplicated, then that is a blessing.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] FW: Import a SAN cloned disk

2009-12-17 Thread Bone, Nick


-Original Message-
From: Bone, Nick 
Sent: 16 December 2009 16:33
To: oab
Subject: RE: [zfs-discuss] Import a SAN cloned disk

Hi

I know that EMC don't recommend a SnapView snapshot being added to the original 
hosts Storage Group although it is not prevented.

I tried this just now  assigning the Clariion snapshot of the pool LUN to the 
same host.  Although the snapshot LUN is there on the server - 
/dev/dsk/emcpowerxx - zpool import does not 'find' it.

If I assign the snapshot to another host then this host can import it.

Nick 

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of oab
Sent: 16 December 2009 14:29
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] Import a SAN cloned disk

Hi All,
We are looking at introducing EMC Clarion into our environment here. We 
were discussing the following scenario and wonder if someone has an opinion.

Our product spans a number of servers with some of the data held within Veritas 
and some held within ZFS. We have a requirement to snapshot all the data 
simultaneously. We can snapshot or clone all the luns simultaneously within the 
clariion but here comes the problem.

In the server using ZFS, we wish to use the snapshot/cloned disk to perform a 
backup. So we essentially will do the following

[1] Create snapsot on Clarion
[2] Present Snapped Lun to the same server.

How will ZFS react to this. In my mind it will see two copies of the same disk. 
I wish to import this snapped disk into a new pool, mount it and back the data 
up. But am not sure how ZFS will react.

Veritas seems to handle this with useclonedev option.

We will be using at least Solaris 10 U8 if/when we roll this out.

Thanking you in advance

OAB
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss








Chartis Insurance UK Limited
Registered in England: company number 1486260
Registered address: The Chartis Building, 58 Fenchurch St, London EC3M 4AB, 
United Kingdom
Authorised and regulated by the UK Financial Services Authority (FSA 
registration number 202628)
This information can be checked by visiting the FSA website - 
www.fsa.gov.uk/register

Please visit our website - www.chartisinsurance.com/uk

The information contained within this email and any attachment is strictly 
confidential and may be legally privileged. It is intended solely for the 
above-named addressee(s). If you have received this email in error, please 
notify the sender and delete the email from your system immediately - you are 
not entitled to use it, copy it, store it or disclose it to anyone else. 
Chartis Insurance UK Limited and other subsidiaries and affiliates of Chartis 
Inc. (collectively Chartis, We or Us) may monitor and record email 
traffic data and content. Emails are not secure and may contain viruses. We do 
not accept any liability or responsibility for viruses transmitted through this 
email, or any attachment, or for changes made to this email after it was sent. 
Any opinions or other information in this email that do not relate to the 
official business of Chartis shall be understood as neither given nor endorsed 
by us.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DeDup and Compression - Reverse Order?

2009-12-17 Thread Nicolas Williams
On Thu, Dec 17, 2009 at 03:32:21PM +0100, Kjetil Torgrim Homme wrote:
 if the hash used for dedup is completely separate from the hash used for
 data protection, I don't see any downsides to computing the dedup hash
 from uncompressed data.  why isn't it?

Hash and checksum functions are slow (hash functions are slower, but
either way you'll be loading large blocks of data, which sets a floor
for cost).  Duplicating work is bad for performance.  Using the same
checksum for integrity protection and dedup is an optimization, and a
very nice one at that.  Having separate checksums would require making
blkptr_t larger, which imposes its own costs.

There's lots of trade-offs here.  Using the same checksum/hash for
integrity protection and dedup is a great solution.

If you use a non-cryptographic checksum algorithm then you'll
want to enable verification for dedup.  That's all.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How do I determine dedupe effectiveness?

2009-12-17 Thread Stacy Maydew
I'm trying to see if zfs dedupe is effective on our datasets, but I'm having a 
hard time figuring out how to measure the space saved.

When I sent one backup set to the filesystem, the usage reported by zfs list 
and zfs get used my zfs are the expected values based on the data size.  

When I store a second copy, which should dedupe entirely, the zfs commands 
report the doubled used space that would be occupied if dedupe was turned off.

My question is, are the numbers being reported by the zfs command taking into 
account the deduplication, or is there some other way to see how much space 
we're saving.

Thanks,

Stacy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How do I determine dedupe effectiveness?

2009-12-17 Thread Cyril Plisko
On Thu, Dec 17, 2009 at 8:57 PM, Stacy Maydew stacy.may...@sun.com wrote:
 I'm trying to see if zfs dedupe is effective on our datasets, but I'm having 
 a hard time figuring out how to measure the space saved.

 When I sent one backup set to the filesystem, the usage reported by zfs 
 list and zfs get used my zfs are the expected values based on the data 
 size.

 When I store a second copy, which should dedupe entirely, the zfs commands 
 report the doubled used space that would be occupied if dedupe was turned off.

 My question is, are the numbers being reported by the zfs command taking into 
 account the deduplication, or is there some other way to see how much space 
 we're saving.


Try zpool list

For example:

$ zpool list
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
rpool87G  76.3G  10.7G87%  1.17x  ONLINE  -


In this case the dedup ratio is 1.17

-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Dedupe reporting incorrect savings

2009-12-17 Thread Adam Leventhal
Hi Giridhar,

The size reported by ls can include things like holes in the file. What space 
usage does the zfs(1M) command report for the filesystem?

Adam

On Dec 16, 2009, at 10:33 PM, Giridhar K R wrote:

 Hi,
 
 Reposting as I have not gotten any response.
 
 Here is the issue. I created a zpool with 64k recordsize and enabled dedupe 
 on it.
 --zpool create -O recordsize=64k TestPool device1
 --zfs set dedup=on TestPool
 
 I copied files onto this pool over nfs from a windows client.
 
 Here is the output of zpool list
 -- zpool list
 NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
 TestPool 696G 19.1G 677G 2% 1.13x ONLINE -
 
 I ran ls -l /TestPool and saw the total size reported as 51,193,782,290 
 bytes.
 The alloc size reported by zpool along with the DEDUP of 1.13x does not addup 
 to 51,193,782,290 bytes.
 
 According to the DEDUP (Dedupe ratio) the amount of data copied is 21.58G 
 (19.1G * 1.13) 
 
 Here is the output from zdb -DD
 
 -- zdb -DD TestPool
 DDT-sha256-zap-duplicate: 33536 entries, size 272 on disk, 140 in core
 DDT-sha256-zap-unique: 278241 entries, size 274 on disk, 142 in core
 
 DDT histogram (aggregated over all DDTs):
 
 bucket allocated referenced
 __ __ __
 refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
 -- -- - - - -- - - -
 1 272K 17.0G 17.0G 17.0G 272K 17.0G 17.0G 17.0G
 2 32.7K 2.05G 2.05G 2.05G 65.6K 4.10G 4.10G 4.10G
 4 15 960K 960K 960K 71 4.44M 4.44M 4.44M
 8 4 256K 256K 256K 53 3.31M 3.31M 3.31M
 16 1 64K 64K 64K 16 1M 1M 1M
 512 1 64K 64K 64K 854 53.4M 53.4M 53.4M
 1K 1 64K 64K 64K 1.08K 69.1M 69.1M 69.1M
 4K 1 64K 64K 64K 5.33K 341M 341M 341M
 Total 304K 19.0G 19.0G 19.0G 345K 21.5G 21.5G 21.5G
 
 dedup = 1.13, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.13
 
 
 Am I missing something?
 
 Your inputs are much appritiated.
 
 Thanks,
 Giri
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How do I determine dedupe effectiveness?

2009-12-17 Thread Brandon High
On Thu, Dec 17, 2009 at 10:57 AM, Stacy Maydew stacy.may...@sun.com wrote:
 When I sent one backup set to the filesystem, the usage reported by zfs 
 list and zfs get used my zfs are the expected values based on the data 
 size.

 When I store a second copy, which should dedupe entirely, the zfs commands 
 report the doubled used space that would be occupied if dedupe was turned off.

It's how zfs does accounting with dedupe. Even when the blocks are
deduped, they still count toward the size of the volume. It's my
understanding that this is done out of fairness. If the space used
were split between all duplicates and of the copies were deleted, then
the remaining copy could push the user over quota (or fs past it's
limit, etc.)

 My question is, are the numbers being reported by the zfs command taking into 
 account the deduplication, or is there some other way to see how much space 
 we're saving.

'zpool list' or 'zpool get dedup ${zpool_name}'

-B

-- 
Brandon High : bh...@freaks.com
When in doubt, use brute force.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DeDup and Compression - Reverse Order?

2009-12-17 Thread Andrey Kuzmin
On Thu, Dec 17, 2009 at 6:14 PM, Kjetil Torgrim Homme
kjeti...@linpro.no wrote:
 Darren J Moffat darr...@opensolaris.org writes:
 Kjetil Torgrim Homme wrote:
 Andrey Kuzmin andrey.v.kuz...@gmail.com writes:

 Downside you have described happens only when the same checksum is
 used for data protection and duplicate detection. This implies sha256,
 BTW, since fletcher-based dedupe has been dropped in recent builds.

 if the hash used for dedup is completely separate from the hash used
 for data protection, I don't see any downsides to computing the dedup
 hash from uncompressed data.  why isn't it?

 It isn't separate because that isn't how Jeff and Bill designed it.

 thanks for confirming that, Darren.

 I think the design the have is great.

 I don't disagree.

 Instead of trying to pick holes in the theory can you demonstrate a
 real performance problem with compression=on and dedup=on and show
 that it is because of the compression step ?

 compression requires CPU, actually quite a lot of it.  even with the
 lean and mean lzjb, you will get not much more than 150 MB/s per core or
 something like that.  so, if you're copying a 10 GB image file, it will
 take a minute or two, just to compress the data so that the hash can be
 computed so that the duplicate block can be identified.  if the dedup
 hash was based on uncompressed data, the copy would be limited by
 hashing efficiency (and dedup tree lookup)

This isn't exactly true. If, speculatively, one stores two hashes, one
for uncompressed data in ddt and another one, for compressed data,
with data block for data healing, one wins compression for duplicates
and pays by extra hash computation for singletons. So a more correct
question would be if the set of cases where duplicates/singletons and
compression/hashing bandwidth ratios are such that one wins is
non-empty (or, rather, o practical importance).

Regards,
Andrey
.

 I don't know how tightly interwoven the dedup hash tree and the block
 pointer hash tree are, or if it is all possible to disentangle them.

 conceptually it doesn't seem impossible, but that's easy for me to
 say, with no knowledge of the zio pipeline...

 oh, how does encryption play into this?  just don't?  knowing that
 someone else has the same block as you is leaking information, but that
 may be acceptable -- just make different pools for people you don't
 trust.

 Otherwise if you want it changed code it up and show how what you have
 done is better in all cases.

 I wish I could :-)

 --
 Kjetil T. Homme
 Redpill Linpro AS - Changing the game

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How do I determine dedupe effectiveness?

2009-12-17 Thread Stacy Maydew
The commands zpool list and zpool get dedup pool both show a ratio of 
1.10.  

So thanks for that answer.  I'm a bit confused though if the dedup is applied 
per zfs filesystem, not zpool, why can I only see the dedup on a per pool basis 
rather than for each zfs filesystem?

Seems to me there should be a way to get this information for a given zfs 
filesystem?

Thanks again,

Stacy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How do I determine dedupe effectiveness?

2009-12-17 Thread A Darren Dunham
On Thu, Dec 17, 2009 at 12:30:29PM -0800, Stacy Maydew wrote:
 So thanks for that answer.  I'm a bit confused though if the dedup is
 applied per zfs filesystem, not zpool, why can I only see the dedup on
 a per pool basis rather than for each zfs filesystem?

 Seems to me there should be a way to get this information for a given
 zfs filesystem?

You can enable and disable it on a fileystem basis, but the dedup is
across the entire pool.
-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Confusion over zpool and zfs versions

2009-12-17 Thread Doug
I'm running Solaris 10 update 8 (10/09).  I started out using an older
version of Solaris and have upgraded a few times.  I have used zpool
upgrade on the pools I have as new versions become available after
kernel updates.

I see now when I run zfs upgrade that pools I created long ago are
at version 1 while pools I created more recently have newer versions.
Can anybody clue me into the difference between zpool and zfs
versions?  Are there any compatibility issues with upgrading zfs versions?
Will this affect zfs send/recv to other systems like the zpool version
does?

Thanks for your advice.

I also noticed misleading info in re. the zfs upgrade -v message.
It prints a message that says For more information on a particular
version, including supported releases, see:
http://www.opensolaris.org/os/community/zfs/version/zpl/N;

I know you are supposed to replace the N in the web page address
with an integer, but I just copy and pasted it into firefox.  When I
did that, I was redirected to:
http://hub.opensolaris.org/bin/view/Community+Group+zfs/N-1

That page is a list of four links labeled ZFS File System Version 1
through Version 4  But, following those links brings up the
descriptions of the ZFS Pool versions 1-4, not the ZFS versions.

Thanks again,
Doug
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs hanging during reads

2009-12-17 Thread Tim
fmdump shows errors on a different drive, and none on the one that has this 
slow read problem:

Nov 27 2009 20:58:28.670057389 ereport.io.scsi.cmd.disk.recovered
nvlist version: 0
class = ereport.io.scsi.cmd.disk.recovered
ena = 0xbeb7f4dd531
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /p...@0,0/pci1043,8...@9/d...@2,0
devid = id1,s...@sata_samsung_hd753lj___s1pwj1cq801987
(end detector)

driver-assessment = recovered
op-code = 0x28
cdb = 0x28 0x0 0x4 0x80 0x32 0x80 0x0 0x0 0x80 0x0
pkt-reason = 0x0
pkt-state = 0x1f
pkt-stats = 0x50
__ttl = 0x1
__tod = 0x4b0fa2c4 0x27f043ad


The serial number of the sus drive is S1PWJ1CQ801987.   

iostat -En shows:

c0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Model: ST360021A   Revision:  Serial No: 3HR2AG72Size: 60.02GB 
60020932608 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 
c6d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Model: SAMSUNG HD154UI Revision:  Serial No: S1Y6J1KS720622  Size: 1500.30GB 
1500295200768 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 
c3t0d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA  Product: SAMSUNG HD753LJ  Revision: 1113 Serial No:  
Size: 750.16GB 750156374016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 28 Predictive Failure Analysis: 0 
[b]c3t1d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA  Product: SAMSUNG HD753LJ  Revision: 1113 Serial No:  
Size: 750.16GB 750156374016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 49 Predictive Failure Analysis: 0 [/b]
c3t2d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA  Product: SAMSUNG HD753LJ  Revision: 1113 Serial No:  
Size: 750.16GB 750156374016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 28 Predictive Failure Analysis: 0 
c3t3d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA  Product: SAMSUNG HD753LJ  Revision: 1110 Serial No:  
Size: 750.16GB 750156374016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 28 Predictive Failure Analysis: 0 
c0t1d0   Soft Errors: 0 Hard Errors: 30 Transport Errors: 0 
Vendor: ATAPIProduct: CD-RW 52X24  Revision: F.JZ Serial No:  
Size: 0.00GB 0 bytes
Media Error: 0 Device Not Ready: 30 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compress an existing filesystem

2009-12-17 Thread Brandon High
On Thu, Dec 17, 2009 at 7:11 AM, Edward Ned Harvey
sola...@nedharvey.com wrote:
 And I've heard a trend of horror stories, that zfs has a tendency to implode
 when it's very full.  So try to keep your disks below 90%.

I've taken to creating an unmounted empty filesystem with a
reservation to prevent the zpool from filling up. It gives you
behavior similar to ufs's reserved blocks.

$ zfs get reservation,mountpoint tank/reservation
NAME  PROPERTY VALUE   SOURCE
tank/reservation  reservation  10G local
tank/reservation  mountpoint   nonelocal

-B

-- 
Brandon High : bh...@freaks.com
Time is what keeps everything from happening all at once.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs zend is very slow

2009-12-17 Thread Michael Herf
I have observed the opposite, and I believe that all writes are slow to my
dedup'd pool.

I used local rsync (no ssh) for one of my migrations (so it was restartable,
as it took *4 days*), and the writes were slow just like zfs recv.

I have not seen fast writes of real data to the deduped volume, if you're
copying enough data. (I assume there's some sort of writeback behavior to
make small writes faster?)

Of course if you just use mkfile, it does run amazingly fast.

mike


Edward Ned Harvey wrote:

  I'm willing to accept slower writes with compression enabled, par for
 the course. Local writes, even with compression enabled, can still
 exceed 500MB/sec, with moderate to high CPU usage.
 These problems seem to have manifested after snv_128, and seemingly
 only affect ZFS receive speeds. Local pool performance is still very
 fast.


  Now we're getting somewhere.  ;-)
 You've tested the source disk (result: fast.)
 You've tested the destination disk without zfs receive (result: fast.)
 Now the only two ingredients left are:

 Ssh performance, or zfs receive performance.

 So, to conclusively identify and prove and measure that zfs receive is the
 problem, how about this:
 zfs send somefilessytem | ssh somehost 'cat  /dev/null'

 If that goes slow, then ssh is the culprit.
 If that goes fast ... and then you change to zfs receive and that goes
 slow ... Now you've scientifically shown that zfs receive is slow.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confusion over zpool and zfs versions

2009-12-17 Thread Cindy Swearingen

Hi Doug,

The pool and file system version upgrades allow you to access new
features that are available for a particular Solaris release. For
example, if you upgrade your system to Solaris 10 10/09, then you
would need to upgrade your pool version to access the pool features
available in the Solaris 10 10/09 release. If you created a new
pool on the same system, then those features would be available
automatically.

Some features are specific to the file system format and some features
are specific to the pool format, hence the different versions.

The compatibility issue for pools is that you will not be able to
import a pool of a later version on a system running an earlier Solaris
version.

Another example is that you can't send a ZFS send stream with the dedup
flag to a system that doesn't understand dedup. For some operations, the 
file system can be received, but not mounted on a system at a lower

version.

The pool and fs version is also related, in that if you were running
pool 10 and tried to upgrade to ZFS fs version 4, ZFS would tell you
that you need to upgrade the pool version.

I fixed the problem with the ZFS version pages.

Thanks for reporting it.

Cindy

On 12/17/09 14:12, Doug wrote:

I'm running Solaris 10 update 8 (10/09).  I started out using an older
version of Solaris and have upgraded a few times.  I have used zpool
upgrade on the pools I have as new versions become available after
kernel updates.

I see now when I run zfs upgrade that pools I created long ago are
at version 1 while pools I created more recently have newer versions.
Can anybody clue me into the difference between zpool and zfs
versions?  Are there any compatibility issues with upgrading zfs versions?
Will this affect zfs send/recv to other systems like the zpool version
does?

Thanks for your advice.

I also noticed misleading info in re. the zfs upgrade -v message.
It prints a message that says For more information on a particular
version, including supported releases, see:
http://www.opensolaris.org/os/community/zfs/version/zpl/N;

I know you are supposed to replace the N in the web page address
with an integer, but I just copy and pasted it into firefox.  When I
did that, I was redirected to:
http://hub.opensolaris.org/bin/view/Community+Group+zfs/N-1

That page is a list of four links labeled ZFS File System Version 1
through Version 4  But, following those links brings up the
descriptions of the ZFS Pool versions 1-4, not the ZFS versions.

Thanks again,
Doug

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup existing data

2009-12-17 Thread Brandon High
On Wed, Dec 16, 2009 at 6:17 AM, Steven Sim unixan...@gmail.com wrote:
 r...@sunlight:/root# zfs send myplace/myd...@prededup | zfs receive -v
 myplace/mydata
 cannot receive new filesystem stream: destination 'myplace/fujitsu' exists
 must specify -F to overwrite it

Try something like this:

zfs create -o mountpoint=none myplace/dedup
zfs unmount myplace/mydata # Make sure the source isn't changing anymore
zfs snapshot myplace/myd...@prededup
zfs send -R myplace/myd...@prededup | zfs receive -du myplace/dedup

It'll create a new filesystem myplace/dedup/mydata

zfs rename myplace/mydata myplace/mydata_old
zfs rename myplace/dedup/mydata myplace/mydata
zfs mount myplace/mydata

You can now destroy the old dataset.

I'm also adding a user property to the dedup'd copy so I don't
accidentally do it again, eg:
zfs set com.freaks:deduped=yes myplace/dedup/mydata
prior to the 'zfs rename'.

There's a little more finesse you can use to limit the time your
source dataset is unmounted. Do a snapshot and send|receive to get
most of the data over, then unmount and create a new snapshot and
send|receive to catch any changes since the first.

-B

-- 
Brandon High : bh...@freaks.com
Mistakes are often the stepping stones to utter failure.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup existing data

2009-12-17 Thread Anil
If you have another partition with enough space, you could technically just do:

mv src /some/other/place
mv /some/other/place src

Anyone see a problem with that? Might be the best way to get it de-duped.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Dedupe reporting incorrect savings

2009-12-17 Thread Giridhar K R
 Hi Giridhar,
 
 The size reported by ls can include things like holes
 in the file. What space usage does the zfs(1M)
 command report for the filesystem?
 
 Adam
 
 On Dec 16, 2009, at 10:33 PM, Giridhar K R wrote:
 
  Hi,
  
  Reposting as I have not gotten any response.
  
  Here is the issue. I created a zpool with 64k
 recordsize and enabled dedupe on it.
  --zpool create -O recordsize=64k TestPool device1
  --zfs set dedup=on TestPool
  
  I copied files onto this pool over nfs from a
 windows client.
  
  Here is the output of zpool list
  -- zpool list
  NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
  TestPool 696G 19.1G 677G 2% 1.13x ONLINE -
  
  I ran ls -l /TestPool and saw the total size
 reported as 51,193,782,290 bytes.
  The alloc size reported by zpool along with the
 DEDUP of 1.13x does not addup to 51,193,782,290
 bytes.
  
  According to the DEDUP (Dedupe ratio) the amount of
 data copied is 21.58G (19.1G * 1.13) 
  
  Here is the output from zdb -DD
  
  -- zdb -DD TestPool
  DDT-sha256-zap-duplicate: 33536 entries, size 272
 on disk, 140 in core
  DDT-sha256-zap-unique: 278241 entries, size 274 on
 disk, 142 in core
  
  DDT histogram (aggregated over all DDTs):
  
  bucket allocated referenced
  __ __
 __
  refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE
 DSIZE
  -- -- - - - -- - -
 -
  1 272K 17.0G 17.0G 17.0G 272K 17.0G 17.0G 17.0G
  2 32.7K 2.05G 2.05G 2.05G 65.6K 4.10G 4.10G 4.10G
  4 15 960K 960K 960K 71 4.44M 4.44M 4.44M
  8 4 256K 256K 256K 53 3.31M 3.31M 3.31M
  16 1 64K 64K 64K 16 1M 1M 1M
  512 1 64K 64K 64K 854 53.4M 53.4M 53.4M
  1K 1 64K 64K 64K 1.08K 69.1M 69.1M 69.1M
  4K 1 64K 64K 64K 5.33K 341M 341M 341M
  Total 304K 19.0G 19.0G 19.0G 345K 21.5G 21.5G 21.5G
  
  dedup = 1.13, compress = 1.00, copies = 1.00, dedup
 * compress / copies = 1.13
  
  
  Am I missing something?
  
  Your inputs are much appritiated.
  
  Thanks,
  Giri
  -- 
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
 
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss
 
 
 --
 Adam Leventhal, Fishworks
http://blogs.sun.com/ahl
 _
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss

Thanks for the response Adam.

Are you talking about ZFS list?

It displays 19.6 as allocated space.

What does ZFS treat as hole and how does it identify?

Thanks,
Giri
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DeDup and Compression - Reverse Order?

2009-12-17 Thread Daniel Carosone
Your parenthetical comments here raise some concerns, or at least eyebrows, 
with me.  Hopefully you can lower them again.

 compress, encrypt, checksum, dedup.


 (and you need to use zdb to get enough info to see the
 leak - and that means you have access to the raw devices)

An attacker with access to the raw devices is the primary base threat model for 
on-disk encryption, surely?

An attacker with access to disk traffic, via e.g. iSCSI, who can also deploy 
dynamic traffic analysis in addition to static content analysis, and who also 
has similarly greater opportunities for tampering, is another trickier threat 
model.

It seems like entirely wrong thinking (even in parentheses) to dismiss an issue 
as irrelevant because it only applies in the primary threat model.

 (and the way I have implemented the IV 
 generation for AES CCM/GCM mode ensures that the same
 plaintext will have the same IV so the ciphertexts will match).

Again, this seems like a cause for concern.  Have you effectively turned these 
fancy and carefully designed crypto modes back into ECB, albeit at a larger 
block size (and only within a dataset)?  

Let's consider copy-on-write semantics: with the above issue an attacker can 
tell which blocks of a file have changed over time, even if unchanged blocks 
have been rewritten.. giving even the static image attacker some traffic 
analysis capability.

This would be a problem regardless of dedup, for the scenario where the 
attacker can see repeated ciphertext on disk (unless the dedup metadata itself 
is sufficiently encrypted, which I understand it is not).

 (you need to understand 
 what I've done with the AES CCM/GCM MAC

I'd like to, but more to understand what (if any) protection is given against 
replay attacks (above that already provided by the merkle hash tree).

I await ZFS crypto with even more enthusiasm than dedup, thanks for talking 
about the details with us.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup existing data

2009-12-17 Thread Brandon High
On Thu, Dec 17, 2009 at 3:10 PM, Anil an...@entic.net wrote:
 If you have another partition with enough space, you could technically just 
 do:

 mv src /some/other/place
 mv /some/other/place src

 Anyone see a problem with that? Might be the best way to get it de-duped.

You'd lose any existing snapshots. You may lose ACLs.

If you have snapshots of the source, the space will still be used
until you destroy the snapshots.

-B

-- 
Brandon High : bh...@freaks.com
Indecision is the key to flexibility.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Dedupe reporting incorrect savings

2009-12-17 Thread Adam Leventhal
 Thanks for the response Adam.
 
 Are you talking about ZFS list?
 
 It displays 19.6 as allocated space.
 
 What does ZFS treat as hole and how does it identify?

ZFS will compress blocks of zeros down to nothing and treat them like
sparse files. 19.6 is pretty close to your computed. Does your pool
happen to be 10+1 RAID-Z?

Adam

--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs zend is very slow

2009-12-17 Thread Brandon High
It looks like the kernel is using a lot of memory, which may be part
of the performance problem. The ARC has shrunk to 1G, and the kernel
is using up over 5G.

I'm doing a send|receive of 683G of data. I started it last night
around 1am, and as of right now it's only sent 450GB. That's about
8.5MB/sec.

Are there any other stats, or dtrace scripts I can look at to
determine what's happening?

bh...@basestar:~$ pfexec mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic
cpu_ms.AuthenticAMD.15 uppc pcplusmp rootnex scsi_vhci zfs sata sd
sockfs ip hook neti sctp arp usba fctl random crypto cpc fcip smbsrv
nfs lofs ufs logindmux ptm sppp ipc ]
 ::memstat
Page SummaryPagesMB  %Tot
     
Kernel1405991  5492   67%
ZFS File Data  223137   871   11%
Anon   396743  1549   19%
Exec and libs1936 70%
Page cache   5221200%
Free (cachelist) 9181350%
Free (freelist) 52685   2053%

Total 2094894  8183
Physical  2094893  8183

bh...@basestar:~$ arcstat.pl 5 3
Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
16:05:33  204M6M  33M53M23M1 1G1G
16:05:38   562   101 1897   17 4   2397   17 1G1G
16:05:431K   709 39716   637   9479   15 1G1G

-B

-- 
Brandon High : bh...@freaks.com
Always try to do things in chronological order; it's less confusing that way.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Dedupe reporting incorrect savings

2009-12-17 Thread Giridhar K R
I used the default while creating zpool with one disk drive. I guess it is a 
RAID 0 configuration.

Thanks,
Giri
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs zend is very slow

2009-12-17 Thread Michael Herf
My ARC is ~3GB.

I'm doing a test that copies 10GB of data to a volume where the blocks
should dedupe 100% with existing data.

First time, the test that runs 5MB sec, seems to average 10-30% ARC *miss*
rate. 400 arc reads/sec.
When things are working at disk bandwidth, I'm getting 3-5% ARC misses. Up
to 7k arc reads/sec.

If I do a recv on a small dataset, then immediately destroy  replay the
same thing, I can get in-core dedupe performance, and it's truly amazing.

Does anyone know how big the dedupe tables are, and if they can be given
some priority/prefetch in ARC? I think I have enough RAM to make this work.

mike

On Thu, Dec 17, 2009 at 4:12 PM, Brandon High bh...@freaks.com wrote:

 It looks like the kernel is using a lot of memory, which may be part
 of the performance problem. The ARC has shrunk to 1G, and the kernel
 is using up over 5G.

 I'm doing a send|receive of 683G of data. I started it last night
 around 1am, and as of right now it's only sent 450GB. That's about
 8.5MB/sec.

 Are there any other stats, or dtrace scripts I can look at to
 determine what's happening?

 bh...@basestar:~$ pfexec mdb -k
 Loading modules: [ unix genunix specfs dtrace mac cpu.generic
 cpu_ms.AuthenticAMD.15 uppc pcplusmp rootnex scsi_vhci zfs sata sd
 sockfs ip hook neti sctp arp usba fctl random crypto cpc fcip smbsrv
 nfs lofs ufs logindmux ptm sppp ipc ]
  ::memstat
 Page SummaryPagesMB  %Tot
      
 Kernel1405991  5492   67%
 ZFS File Data  223137   871   11%
 Anon   396743  1549   19%
 Exec and libs1936 70%
 Page cache   5221200%
 Free (cachelist) 9181350%
 Free (freelist) 52685   2053%

 Total 2094894  8183
 Physical  2094893  8183

 bh...@basestar:~$ arcstat.pl 5 3
 Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
 16:05:33  204M6M  33M53M23M1 1G1G
 16:05:38   562   101 1897   17 4   2397   17 1G1G
 16:05:431K   709 39716   637   9479   15 1G1G

 -B

 --
 Brandon High : bh...@freaks.com
 Always try to do things in chronological order; it's less confusing that
 way.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled

2009-12-17 Thread Jack Kielsmeier
Ok, my console is 100% completely hung, not gonna be able to enter any commands 
when it freezes.

I can't even get the numlock light to change it's status.

This time I even plugged in a PS/2 keyboard instead of USB thinking maybe it 
was USB dying during the hang, but not so.

I have hard rebooted my system again.

I'm going to set up a script that will continuously run savecore, after 10, 
I'll reset the bounds file. Hopefully by doing it this way, I'll get a savecore 
right as the system starts to go unresponsive.

I'll post the script I'll be running here shortly after I write it.

Also, as far as using 'sync' Im not sure what exactly I would do there.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled

2009-12-17 Thread Jack Kielsmeier
Ok, this is the script I am running (as a background process). This script 
doesn't matter much, it's just here for reference, as I'm running into problems 
just running the savecore command while the zpool import is running.


#!/bin/bash

count=1
rm /var/crash/opensol/bounds

/usr/bin/savecore -L

while [ 1 ]
do
if [ $count == 10 ]
then
count=1
rm /var/crash/opensol/bounds
fi
savecore -L
count=`expr $count + 1`
done


opensol was the name of the system before I renamed it to wd40, crash data is 
still set to be put in /var/crash/opensol

I have started another zpool import of the vault volume


r...@wd40:~# zpool import
  pool: vault
id: 4018273146420816291
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

vault   ONLINE
  raidz1-0  ONLINE
c3d0ONLINE
c3d1ONLINE
c4d0ONLINE
c4d1ONLINE
r...@wd40:~# zpool import 4018273146420816291 
[1] 1093


After starting the import, savecore -L no longer finishes

r...@wd40:/var/adm# savecore -L
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
 0:05 100% done
100% done: 153601 pages dumped, dump succeeded

It should be saying that it's saving to /var/crash/opensol/, but instead it 
just hangs and never returns me to a prompt

Previous to running zpool import, the savecore command took anywhere from 10-15 
seconds to finish.

If I cd to /var/crash/opensol, there is not a new file created

I tried firing off savecore again, same result.

A ps listing shows the savecore command

r...@wd40:/var/crash/opensol# ps -ef | grep savecore
root  1092  1061   0 22:27:55 ?   0:01 savecore -L
root  1134  1083   0 22:33:28 pts/3   0:00 grep savecore
root  1113   787   0 22:30:23 ?   0:01 savecore -L

(One of these is from the script I was running when I started the import 
manually, the other when I just ran the savecore -L command by itself).

I cannot kill these processes, even with a kill -9

I then hard rebooted my server yet again (as it hangs if it's in process of a 
zpool import)

After the reboot, all I did was ssh in, disable gdm, run my zfs import command, 
and try another savecore (this time not trying to use my script above first, 
just a simple savecore -L as root from the command line), once again it hangs

r...@wd40:~# zpool import 4018273146420816291 
[1] 783
r...@wd40:~# savecore -L
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
 0:05 100% done
100% done: 138876 pages dumped, dump succeeded
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] force 4k writes

2009-12-17 Thread stuart anderson
 On Wed, Dec 16 at  7:35, Bill Sprouse wrote:
 The question behind the question is, given the
 really bad things that 
 can happen performance-wise with writes that are not
 4k aligned when 
 using flash devices, is there any way to insure that
 any and all 
 writes from ZFS are 4k aligned?
 
 Some flash devices can handle this better than
 others, often several
 orders of magnitude better.  Not all devices (as you
 imply) are
 so-affected.


As a specific example of 2 devices with dramatically different performance for 
sub-4k transfers has anyone done any ZFS benchmarks between the X25E and the 
F20 they can share?

I am particularly interested in zvol performance with a blocksize of 16k and 
highly compressible data (~10x). I am going to run some comparison tests but 
would appreciate any initial input on what to look out for or how to tune ZFS 
to get the most out of the F20.

It might be helpful, e.g., if there where some where in the software stack 
where I could tell part of the system to lie and treat the F20 as a 4k device?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] force 4k writes

2009-12-17 Thread Richard Elling

On Dec 17, 2009, at 9:04 PM, stuart anderson wrote:


As a specific example of 2 devices with dramatically different  
performance for sub-4k transfers has anyone done any ZFS benchmarks  
between the X25E and the F20 they can share?


I am particularly interested in zvol performance with a blocksize of  
16k and highly compressible data (~10x).


16 KB recordsize?  That seems a little unusual, what is the application?

 I am going to run some comparison tests but would appreciate any  
initial input on what to look out for or how to tune ZFS to get the  
most out of the F20.


AFAICT, no tuning should be required.  It is quite fast.

It might be helpful, e.g., if there where some where in the software  
stack where I could tell part of the system to lie and treat the F20  
as a 4k device?


The F20 is rated at 84,000 random 4KB write IOPS.  The DRAM write
buffer will hide 4KB write effects.

OTOH, the X-25E is rated at 3,300 random 4KB writes.  It shouldn't take
much armchair analysis to come to the conclusion that the F20 is likely
to win that IOPS battle :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled

2009-12-17 Thread Victor Latushkin

On 18.12.09 07:13, Jack Kielsmeier wrote:

Ok, my console is 100% completely hung, not gonna be able to enter any
commands when it freezes.

I can't even get the numlock light to change it's status.

This time I even plugged in a PS/2 keyboard instead of USB thinking maybe it
was USB dying during the hang, but not so.

I have hard rebooted my system again.


I think it may be better to boot system with kmdb loaded - you need to edit you 
GRUB menu OpenSolaris entry and add -k to kernel$ line.


Or you can just load kmdb from the console:

mdb -K

then type :c to continue

When system freezes, you can use F1-A key combination to drop into kmdb, and 
then you can type


$systemdump

to generate crashdump and reboot.

Regards,
victor



I'm going to set up a script that will continuously run savecore, after 10,
I'll reset the bounds file. Hopefully by doing it this way, I'll get a
savecore right as the system starts to go unresponsive.

I'll post the script I'll be running here shortly after I write it.

Also, as far as using 'sync' Im not sure what exactly I would do there.



--
--
Victor Latushkin   phone: x11467 / +74959370467
TSC-Kernel EMEAmobile: +78957693012
Sun Services, Moscow   blog: http://blogs.sun.com/vlatushkin
Sun Microsystems
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss