Re: [zfs-discuss] zfs hanging during reads
On Wed, Dec 16 at 22:41, Tim wrote: hmm, not seeing the same slow down when I boot from the Samsung EStool CD and run a diag which performs a surface scan... could this still be a hardware issue, or possibly something with the Solaris data format on the disk? Rotating drives often have various optimizations to help recover from damaged servo sectors when reading sequentially, in that they can skip over bad areas and just assume that the position information is there, until they get an ECC fatal on a read. Until the drive wanders off-track, it just keeps reading until it eventually finds some position information. I'm guessing you have a physical problem with the servo wedges on that drive that only manifests itself in some of your access methods. Does the drive click or make any other noises when this is happening? For the price of drives today, I'd buy a replacement and look at swapping that one out. You can always keep it as a spare for later. -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] force 4k writes
On Wed, Dec 16 at 7:35, Bill Sprouse wrote: The question behind the question is, given the really bad things that can happen performance-wise with writes that are not 4k aligned when using flash devices, is there any way to insure that any and all writes from ZFS are 4k aligned? Some flash devices can handle this better than others, often several orders of magnitude better. Not all devices (as you imply) are so-affected. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DeDup and Compression - Reverse Order?
Downside you have described happens only when the same checksum is used for data protection and duplicate detection. This implies sha256, BTW, since fletcher-based dedupe has been dropped in recent builds. On 12/17/09, Kjetil Torgrim Homme kjeti...@linpro.no wrote: Andrey Kuzmin andrey.v.kuz...@gmail.com writes: Darren J Moffat wrote: Andrey Kuzmin wrote: Resilvering has noting to do with sha256: one could resilver long before dedupe was introduced in zfs. SHA256 isn't just used for dedup it is available as one of the checksum algorithms right back to pool version 1 that integrated in build 27. 'One of' is the key word. And thanks for code pointers, I'll take a look. I didn't mention sha256 at all :-). the reasoning is the same no matter what hash algorithm you're using (fletcher2, fletcher4 or sha256. dedup doesn't require sha256 either, you can use fletcher4. the question was: why does data have to be compressed before it can be recognised as a duplicate? it does seem like a waste of CPU, no? I attempted to show the downsides to identifying blocks by their uncompressed hash. (BTW, it doesn't affect storage efficiency, the same duplicate blocks will be discovered either way.) -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Regards, Andrey ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] force 4k writes
On Thu, Dec 17, 2009 at 09:14, Eric D. Mudama edmud...@bounceswoosh.orgwrote: On Wed, Dec 16 at 7:35, Bill Sprouse wrote: The question behind the question is, given the really bad things that can happen performance-wise with writes that are not 4k aligned when using flash devices, is there any way to insure that any and all writes from ZFS are 4k aligned? Some flash devices can handle this better than others, often several orders of magnitude better. Not all devices (as you imply) are so-affected. Is there - somewhere - a list of flash devices, with some (perhaps subjective) indication of how they handle issues like this? -- -Me ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DeDup and Compression - Reverse Order?
Andrey Kuzmin andrey.v.kuz...@gmail.com writes: Downside you have described happens only when the same checksum is used for data protection and duplicate detection. This implies sha256, BTW, since fletcher-based dedupe has been dropped in recent builds. if the hash used for dedup is completely separate from the hash used for data protection, I don't see any downsides to computing the dedup hash from uncompressed data. why isn't it? -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DeDup and Compression - Reverse Order?
Kjetil Torgrim Homme wrote: Andrey Kuzmin andrey.v.kuz...@gmail.com writes: Downside you have described happens only when the same checksum is used for data protection and duplicate detection. This implies sha256, BTW, since fletcher-based dedupe has been dropped in recent builds. if the hash used for dedup is completely separate from the hash used for data protection, I don't see any downsides to computing the dedup hash from uncompressed data. why isn't it? It isn't separate because that isn't how Jeff and Bill designed it. I think the design the have is great. Instead of trying to pick holes in the theory can you demonstrate a real performance problem with compression=on and dedup=on and show that it is because of the compression step ? Otherwise if you want it changed code it up and show how what you have done is better in all cases. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs zend is very slow
I'm willing to accept slower writes with compression enabled, par for the course. Local writes, even with compression enabled, can still exceed 500MB/sec, with moderate to high CPU usage. These problems seem to have manifested after snv_128, and seemingly only affect ZFS receive speeds. Local pool performance is still very fast. Now we're getting somewhere. ;-) You've tested the source disk (result: fast.) You've tested the destination disk without zfs receive (result: fast.) Now the only two ingredients left are: Ssh performance, or zfs receive performance. So, to conclusively identify and prove and measure that zfs receive is the problem, how about this: zfs send somefilessytem | ssh somehost 'cat /dev/null' If that goes slow, then ssh is the culprit. If that goes fast ... and then you change to zfs receive and that goes slow ... Now you've scientifically shown that zfs receive is slow. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compress an existing filesystem
Hi all, I need to move a filesystem off of one host and onto another smaller one. The fs in question, with no compression enabled, is using 1.2 TB (refer). I'm hoping that zfs compression will dramatically reduce this requirement and allow me to keep the dataset on an 800 GB store. Does this sound feasible? Can I achieve the move to the new box using zfs send/receive? If so, how do I do it? Do I turn on compression on the target host just after I begin the zfs send/receive? How much your data compresses, is dependent on what type of data you have. JPG files won't compress at all, neither will any other already-compressed file format. But if you have a gene sequence file, it'll compress 10:1 because it's so repetitive internally. For typical filesystems, about 70% of original size is a good estimate. Now: If your data will fit into the 800gb drive, it will be tight. And you SHOULD NOT do that. For one, performance of that drive will be horrible, at best. And I've heard a trend of horror stories, that zfs has a tendency to implode when it's very full. So try to keep your disks below 90%. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DeDup and Compression - Reverse Order?
Darren J Moffat darr...@opensolaris.org writes: Kjetil Torgrim Homme wrote: Andrey Kuzmin andrey.v.kuz...@gmail.com writes: Downside you have described happens only when the same checksum is used for data protection and duplicate detection. This implies sha256, BTW, since fletcher-based dedupe has been dropped in recent builds. if the hash used for dedup is completely separate from the hash used for data protection, I don't see any downsides to computing the dedup hash from uncompressed data. why isn't it? It isn't separate because that isn't how Jeff and Bill designed it. thanks for confirming that, Darren. I think the design the have is great. I don't disagree. Instead of trying to pick holes in the theory can you demonstrate a real performance problem with compression=on and dedup=on and show that it is because of the compression step ? compression requires CPU, actually quite a lot of it. even with the lean and mean lzjb, you will get not much more than 150 MB/s per core or something like that. so, if you're copying a 10 GB image file, it will take a minute or two, just to compress the data so that the hash can be computed so that the duplicate block can be identified. if the dedup hash was based on uncompressed data, the copy would be limited by hashing efficiency (and dedup tree lookup). I don't know how tightly interwoven the dedup hash tree and the block pointer hash tree are, or if it is all possible to disentangle them. conceptually it doesn't seem impossible, but that's easy for me to say, with no knowledge of the zio pipeline... oh, how does encryption play into this? just don't? knowing that someone else has the same block as you is leaking information, but that may be acceptable -- just make different pools for people you don't trust. Otherwise if you want it changed code it up and show how what you have done is better in all cases. I wish I could :-) -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Upgrading a volume from iscsitgt to COMSTAR
Hi, I have a zfs volume that's exported via iscsi for my wife's Mac to use for Time Machine. I've just built a new machine to house my big pool, and installed build 129 on it. I'd like to start using COMSTAR for exporting the iscsi targets, rather than the older iscsi infrastructure. I've seen quite a few tutorials on how to use COMSTAR for new volumes (and a few mentions of the shareiscsi=stmf). I've seen some talk about how the old infrastructure used to use the first 64K of the volume for the iscsi information (and that COMSTAR uses the ZFS metadata store.) What I haven't found is a set of steps for taking a volume from the old way of doing things to the new. There are hints (e.g., here: https://opensolaris.org/jive/thread.jspa?threadID=115078), but no concrete set of steps. Despite the ease of use of ZFS, I presume that it's not as simple as saying: zfs set shareiscsi=stmf volume because a) it's not clear to me that that setting for shareiscsi will do the same magic as shareiscsi=on used to do, and b) there's that initial-64K-problem which I assume will make the Mac throw a wobbly when trying to mount the file system that's on the volume. Any advice on how to do this? There's plenty of room to create a new volume and dd over the data (suggestions for the skip parameter to dd welcome, though!) if that's the only way. Once I figure this out, I'll be happy to write it up for my blog, which can then be pointed to when this comes up again. Thanks in advance, Steve -- Stephen Green // stephen.gr...@sun.com Principal Investigator \\ http://blogs.sun.com/searchguy The AURA Project // Voice: +1 781-442-0926 Sun Microsystems Labs \\ Fax: +1 781-442-0399 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DeDup and Compression - Reverse Order?
Kjetil Torgrim Homme wrote: I don't know how tightly interwoven the dedup hash tree and the block pointer hash tree are, or if it is all possible to disentangle them. At the moment I'd say very interwoven by desgin. conceptually it doesn't seem impossible, but that's easy for me to say, with no knowledge of the zio pipeline... Correct it isn't impossible but instead there would probably need to be two checksums held, one of the untransformed data (ie uncompressed and unencrypted) and one of the transformed data (compressed and encrypted). That has different tradeoffs and SHA256 can be expensive too see: http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via Note also that the compress/encrypt/checksum and the dedup are separate pipeline stages so while dedup is happening for block N block N+1 can be getting transformed - so this is designed to take advantage of multiple scheduling units (threads,cpus,cores etc). oh, how does encryption play into this? just don't? knowing that someone else has the same block as you is leaking information, but that may be acceptable -- just make different pools for people you don't trust. compress, encrypt, checksum, dedup. You are correct that it is an information leak but only within a dataset and its clones and only if you can observe the deduplication stats (and you need to use zdb to get enough info to see the leak - and that means you have access to the raw devices), the deupratio isn't really enough unless the pool is really idle or has only one user writing at a time. For the encryption case deduplication of the same plaintext block will only work with in a dataset or a clone of it - because only in those cases do you have the same key (and the way I have implemented the IV generation for AES CCM/GCM mode ensures that the same plaintext will have the same IV so the ciphertexts will match). Also if you place a block in an unencrypted dataset that happens to match the ciphertext in an encrypted dataset they won't dedup either (you need to understand what I've done with the AES CCM/GCM MAC and the zio_chksum_t field in the blkptr_t and how that is used by dedup to see why). If that small information leak isn't acceptable even within the dataset then don't enable both encryption and deduplication on those datasets - and don't delegate that property to your users either. Or you can frequently rekey your per dataset data encryption keys 'zfs key -K' but then you might as well turn dedup off - other there are some very good usecases in multi level security where doing dedup/encryption and rekey provides a nice effect. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DeDup and Compression - Reverse Order?
On Thu, 17 Dec 2009, Kjetil Torgrim Homme wrote: compression requires CPU, actually quite a lot of it. even with the lean and mean lzjb, you will get not much more than 150 MB/s per core or something like that. so, if you're copying a 10 GB image file, it will take a minute or two, just to compress the data so that the hash can be computed so that the duplicate block can be identified. if the dedup hash was based on uncompressed data, the copy would be limited by hashing efficiency (and dedup tree lookup). It is useful to keep in mind that dedupication can save a lot of disk space but it is usually only quite effective in certain circumstances, such as when replicating a collection of files. The majority of write I/O will never benefit from deduplication. Based on this, speculatively assuming that the data will not be deduplicated does not result in increased cost most of the time. If the data does end up being deduplicated, then that is a blessing. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] FW: Import a SAN cloned disk
-Original Message- From: Bone, Nick Sent: 16 December 2009 16:33 To: oab Subject: RE: [zfs-discuss] Import a SAN cloned disk Hi I know that EMC don't recommend a SnapView snapshot being added to the original hosts Storage Group although it is not prevented. I tried this just now assigning the Clariion snapshot of the pool LUN to the same host. Although the snapshot LUN is there on the server - /dev/dsk/emcpowerxx - zpool import does not 'find' it. If I assign the snapshot to another host then this host can import it. Nick -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of oab Sent: 16 December 2009 14:29 To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Import a SAN cloned disk Hi All, We are looking at introducing EMC Clarion into our environment here. We were discussing the following scenario and wonder if someone has an opinion. Our product spans a number of servers with some of the data held within Veritas and some held within ZFS. We have a requirement to snapshot all the data simultaneously. We can snapshot or clone all the luns simultaneously within the clariion but here comes the problem. In the server using ZFS, we wish to use the snapshot/cloned disk to perform a backup. So we essentially will do the following [1] Create snapsot on Clarion [2] Present Snapped Lun to the same server. How will ZFS react to this. In my mind it will see two copies of the same disk. I wish to import this snapped disk into a new pool, mount it and back the data up. But am not sure how ZFS will react. Veritas seems to handle this with useclonedev option. We will be using at least Solaris 10 U8 if/when we roll this out. Thanking you in advance OAB -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Chartis Insurance UK Limited Registered in England: company number 1486260 Registered address: The Chartis Building, 58 Fenchurch St, London EC3M 4AB, United Kingdom Authorised and regulated by the UK Financial Services Authority (FSA registration number 202628) This information can be checked by visiting the FSA website - www.fsa.gov.uk/register Please visit our website - www.chartisinsurance.com/uk The information contained within this email and any attachment is strictly confidential and may be legally privileged. It is intended solely for the above-named addressee(s). If you have received this email in error, please notify the sender and delete the email from your system immediately - you are not entitled to use it, copy it, store it or disclose it to anyone else. Chartis Insurance UK Limited and other subsidiaries and affiliates of Chartis Inc. (collectively Chartis, We or Us) may monitor and record email traffic data and content. Emails are not secure and may contain viruses. We do not accept any liability or responsibility for viruses transmitted through this email, or any attachment, or for changes made to this email after it was sent. Any opinions or other information in this email that do not relate to the official business of Chartis shall be understood as neither given nor endorsed by us. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DeDup and Compression - Reverse Order?
On Thu, Dec 17, 2009 at 03:32:21PM +0100, Kjetil Torgrim Homme wrote: if the hash used for dedup is completely separate from the hash used for data protection, I don't see any downsides to computing the dedup hash from uncompressed data. why isn't it? Hash and checksum functions are slow (hash functions are slower, but either way you'll be loading large blocks of data, which sets a floor for cost). Duplicating work is bad for performance. Using the same checksum for integrity protection and dedup is an optimization, and a very nice one at that. Having separate checksums would require making blkptr_t larger, which imposes its own costs. There's lots of trade-offs here. Using the same checksum/hash for integrity protection and dedup is a great solution. If you use a non-cryptographic checksum algorithm then you'll want to enable verification for dedup. That's all. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How do I determine dedupe effectiveness?
I'm trying to see if zfs dedupe is effective on our datasets, but I'm having a hard time figuring out how to measure the space saved. When I sent one backup set to the filesystem, the usage reported by zfs list and zfs get used my zfs are the expected values based on the data size. When I store a second copy, which should dedupe entirely, the zfs commands report the doubled used space that would be occupied if dedupe was turned off. My question is, are the numbers being reported by the zfs command taking into account the deduplication, or is there some other way to see how much space we're saving. Thanks, Stacy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do I determine dedupe effectiveness?
On Thu, Dec 17, 2009 at 8:57 PM, Stacy Maydew stacy.may...@sun.com wrote: I'm trying to see if zfs dedupe is effective on our datasets, but I'm having a hard time figuring out how to measure the space saved. When I sent one backup set to the filesystem, the usage reported by zfs list and zfs get used my zfs are the expected values based on the data size. When I store a second copy, which should dedupe entirely, the zfs commands report the doubled used space that would be occupied if dedupe was turned off. My question is, are the numbers being reported by the zfs command taking into account the deduplication, or is there some other way to see how much space we're saving. Try zpool list For example: $ zpool list NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT rpool87G 76.3G 10.7G87% 1.17x ONLINE - In this case the dedup ratio is 1.17 -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Dedupe reporting incorrect savings
Hi Giridhar, The size reported by ls can include things like holes in the file. What space usage does the zfs(1M) command report for the filesystem? Adam On Dec 16, 2009, at 10:33 PM, Giridhar K R wrote: Hi, Reposting as I have not gotten any response. Here is the issue. I created a zpool with 64k recordsize and enabled dedupe on it. --zpool create -O recordsize=64k TestPool device1 --zfs set dedup=on TestPool I copied files onto this pool over nfs from a windows client. Here is the output of zpool list -- zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT TestPool 696G 19.1G 677G 2% 1.13x ONLINE - I ran ls -l /TestPool and saw the total size reported as 51,193,782,290 bytes. The alloc size reported by zpool along with the DEDUP of 1.13x does not addup to 51,193,782,290 bytes. According to the DEDUP (Dedupe ratio) the amount of data copied is 21.58G (19.1G * 1.13) Here is the output from zdb -DD -- zdb -DD TestPool DDT-sha256-zap-duplicate: 33536 entries, size 272 on disk, 140 in core DDT-sha256-zap-unique: 278241 entries, size 274 on disk, 142 in core DDT histogram (aggregated over all DDTs): bucket allocated referenced __ __ __ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE -- -- - - - -- - - - 1 272K 17.0G 17.0G 17.0G 272K 17.0G 17.0G 17.0G 2 32.7K 2.05G 2.05G 2.05G 65.6K 4.10G 4.10G 4.10G 4 15 960K 960K 960K 71 4.44M 4.44M 4.44M 8 4 256K 256K 256K 53 3.31M 3.31M 3.31M 16 1 64K 64K 64K 16 1M 1M 1M 512 1 64K 64K 64K 854 53.4M 53.4M 53.4M 1K 1 64K 64K 64K 1.08K 69.1M 69.1M 69.1M 4K 1 64K 64K 64K 5.33K 341M 341M 341M Total 304K 19.0G 19.0G 19.0G 345K 21.5G 21.5G 21.5G dedup = 1.13, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.13 Am I missing something? Your inputs are much appritiated. Thanks, Giri -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do I determine dedupe effectiveness?
On Thu, Dec 17, 2009 at 10:57 AM, Stacy Maydew stacy.may...@sun.com wrote: When I sent one backup set to the filesystem, the usage reported by zfs list and zfs get used my zfs are the expected values based on the data size. When I store a second copy, which should dedupe entirely, the zfs commands report the doubled used space that would be occupied if dedupe was turned off. It's how zfs does accounting with dedupe. Even when the blocks are deduped, they still count toward the size of the volume. It's my understanding that this is done out of fairness. If the space used were split between all duplicates and of the copies were deleted, then the remaining copy could push the user over quota (or fs past it's limit, etc.) My question is, are the numbers being reported by the zfs command taking into account the deduplication, or is there some other way to see how much space we're saving. 'zpool list' or 'zpool get dedup ${zpool_name}' -B -- Brandon High : bh...@freaks.com When in doubt, use brute force. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DeDup and Compression - Reverse Order?
On Thu, Dec 17, 2009 at 6:14 PM, Kjetil Torgrim Homme kjeti...@linpro.no wrote: Darren J Moffat darr...@opensolaris.org writes: Kjetil Torgrim Homme wrote: Andrey Kuzmin andrey.v.kuz...@gmail.com writes: Downside you have described happens only when the same checksum is used for data protection and duplicate detection. This implies sha256, BTW, since fletcher-based dedupe has been dropped in recent builds. if the hash used for dedup is completely separate from the hash used for data protection, I don't see any downsides to computing the dedup hash from uncompressed data. why isn't it? It isn't separate because that isn't how Jeff and Bill designed it. thanks for confirming that, Darren. I think the design the have is great. I don't disagree. Instead of trying to pick holes in the theory can you demonstrate a real performance problem with compression=on and dedup=on and show that it is because of the compression step ? compression requires CPU, actually quite a lot of it. even with the lean and mean lzjb, you will get not much more than 150 MB/s per core or something like that. so, if you're copying a 10 GB image file, it will take a minute or two, just to compress the data so that the hash can be computed so that the duplicate block can be identified. if the dedup hash was based on uncompressed data, the copy would be limited by hashing efficiency (and dedup tree lookup) This isn't exactly true. If, speculatively, one stores two hashes, one for uncompressed data in ddt and another one, for compressed data, with data block for data healing, one wins compression for duplicates and pays by extra hash computation for singletons. So a more correct question would be if the set of cases where duplicates/singletons and compression/hashing bandwidth ratios are such that one wins is non-empty (or, rather, o practical importance). Regards, Andrey . I don't know how tightly interwoven the dedup hash tree and the block pointer hash tree are, or if it is all possible to disentangle them. conceptually it doesn't seem impossible, but that's easy for me to say, with no knowledge of the zio pipeline... oh, how does encryption play into this? just don't? knowing that someone else has the same block as you is leaking information, but that may be acceptable -- just make different pools for people you don't trust. Otherwise if you want it changed code it up and show how what you have done is better in all cases. I wish I could :-) -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do I determine dedupe effectiveness?
The commands zpool list and zpool get dedup pool both show a ratio of 1.10. So thanks for that answer. I'm a bit confused though if the dedup is applied per zfs filesystem, not zpool, why can I only see the dedup on a per pool basis rather than for each zfs filesystem? Seems to me there should be a way to get this information for a given zfs filesystem? Thanks again, Stacy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do I determine dedupe effectiveness?
On Thu, Dec 17, 2009 at 12:30:29PM -0800, Stacy Maydew wrote: So thanks for that answer. I'm a bit confused though if the dedup is applied per zfs filesystem, not zpool, why can I only see the dedup on a per pool basis rather than for each zfs filesystem? Seems to me there should be a way to get this information for a given zfs filesystem? You can enable and disable it on a fileystem basis, but the dedup is across the entire pool. -- Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Confusion over zpool and zfs versions
I'm running Solaris 10 update 8 (10/09). I started out using an older version of Solaris and have upgraded a few times. I have used zpool upgrade on the pools I have as new versions become available after kernel updates. I see now when I run zfs upgrade that pools I created long ago are at version 1 while pools I created more recently have newer versions. Can anybody clue me into the difference between zpool and zfs versions? Are there any compatibility issues with upgrading zfs versions? Will this affect zfs send/recv to other systems like the zpool version does? Thanks for your advice. I also noticed misleading info in re. the zfs upgrade -v message. It prints a message that says For more information on a particular version, including supported releases, see: http://www.opensolaris.org/os/community/zfs/version/zpl/N; I know you are supposed to replace the N in the web page address with an integer, but I just copy and pasted it into firefox. When I did that, I was redirected to: http://hub.opensolaris.org/bin/view/Community+Group+zfs/N-1 That page is a list of four links labeled ZFS File System Version 1 through Version 4 But, following those links brings up the descriptions of the ZFS Pool versions 1-4, not the ZFS versions. Thanks again, Doug -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs hanging during reads
fmdump shows errors on a different drive, and none on the one that has this slow read problem: Nov 27 2009 20:58:28.670057389 ereport.io.scsi.cmd.disk.recovered nvlist version: 0 class = ereport.io.scsi.cmd.disk.recovered ena = 0xbeb7f4dd531 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /p...@0,0/pci1043,8...@9/d...@2,0 devid = id1,s...@sata_samsung_hd753lj___s1pwj1cq801987 (end detector) driver-assessment = recovered op-code = 0x28 cdb = 0x28 0x0 0x4 0x80 0x32 0x80 0x0 0x0 0x80 0x0 pkt-reason = 0x0 pkt-state = 0x1f pkt-stats = 0x50 __ttl = 0x1 __tod = 0x4b0fa2c4 0x27f043ad The serial number of the sus drive is S1PWJ1CQ801987. iostat -En shows: c0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: ST360021A Revision: Serial No: 3HR2AG72Size: 60.02GB 60020932608 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c6d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: SAMSUNG HD154UI Revision: Serial No: S1Y6J1KS720622 Size: 1500.30GB 1500295200768 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 c3t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD753LJ Revision: 1113 Serial No: Size: 750.16GB 750156374016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 28 Predictive Failure Analysis: 0 [b]c3t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD753LJ Revision: 1113 Serial No: Size: 750.16GB 750156374016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 49 Predictive Failure Analysis: 0 [/b] c3t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD753LJ Revision: 1113 Serial No: Size: 750.16GB 750156374016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 28 Predictive Failure Analysis: 0 c3t3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD753LJ Revision: 1110 Serial No: Size: 750.16GB 750156374016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 28 Predictive Failure Analysis: 0 c0t1d0 Soft Errors: 0 Hard Errors: 30 Transport Errors: 0 Vendor: ATAPIProduct: CD-RW 52X24 Revision: F.JZ Serial No: Size: 0.00GB 0 bytes Media Error: 0 Device Not Ready: 30 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] compress an existing filesystem
On Thu, Dec 17, 2009 at 7:11 AM, Edward Ned Harvey sola...@nedharvey.com wrote: And I've heard a trend of horror stories, that zfs has a tendency to implode when it's very full. So try to keep your disks below 90%. I've taken to creating an unmounted empty filesystem with a reservation to prevent the zpool from filling up. It gives you behavior similar to ufs's reserved blocks. $ zfs get reservation,mountpoint tank/reservation NAME PROPERTY VALUE SOURCE tank/reservation reservation 10G local tank/reservation mountpoint nonelocal -B -- Brandon High : bh...@freaks.com Time is what keeps everything from happening all at once. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs zend is very slow
I have observed the opposite, and I believe that all writes are slow to my dedup'd pool. I used local rsync (no ssh) for one of my migrations (so it was restartable, as it took *4 days*), and the writes were slow just like zfs recv. I have not seen fast writes of real data to the deduped volume, if you're copying enough data. (I assume there's some sort of writeback behavior to make small writes faster?) Of course if you just use mkfile, it does run amazingly fast. mike Edward Ned Harvey wrote: I'm willing to accept slower writes with compression enabled, par for the course. Local writes, even with compression enabled, can still exceed 500MB/sec, with moderate to high CPU usage. These problems seem to have manifested after snv_128, and seemingly only affect ZFS receive speeds. Local pool performance is still very fast. Now we're getting somewhere. ;-) You've tested the source disk (result: fast.) You've tested the destination disk without zfs receive (result: fast.) Now the only two ingredients left are: Ssh performance, or zfs receive performance. So, to conclusively identify and prove and measure that zfs receive is the problem, how about this: zfs send somefilessytem | ssh somehost 'cat /dev/null' If that goes slow, then ssh is the culprit. If that goes fast ... and then you change to zfs receive and that goes slow ... Now you've scientifically shown that zfs receive is slow. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confusion over zpool and zfs versions
Hi Doug, The pool and file system version upgrades allow you to access new features that are available for a particular Solaris release. For example, if you upgrade your system to Solaris 10 10/09, then you would need to upgrade your pool version to access the pool features available in the Solaris 10 10/09 release. If you created a new pool on the same system, then those features would be available automatically. Some features are specific to the file system format and some features are specific to the pool format, hence the different versions. The compatibility issue for pools is that you will not be able to import a pool of a later version on a system running an earlier Solaris version. Another example is that you can't send a ZFS send stream with the dedup flag to a system that doesn't understand dedup. For some operations, the file system can be received, but not mounted on a system at a lower version. The pool and fs version is also related, in that if you were running pool 10 and tried to upgrade to ZFS fs version 4, ZFS would tell you that you need to upgrade the pool version. I fixed the problem with the ZFS version pages. Thanks for reporting it. Cindy On 12/17/09 14:12, Doug wrote: I'm running Solaris 10 update 8 (10/09). I started out using an older version of Solaris and have upgraded a few times. I have used zpool upgrade on the pools I have as new versions become available after kernel updates. I see now when I run zfs upgrade that pools I created long ago are at version 1 while pools I created more recently have newer versions. Can anybody clue me into the difference between zpool and zfs versions? Are there any compatibility issues with upgrading zfs versions? Will this affect zfs send/recv to other systems like the zpool version does? Thanks for your advice. I also noticed misleading info in re. the zfs upgrade -v message. It prints a message that says For more information on a particular version, including supported releases, see: http://www.opensolaris.org/os/community/zfs/version/zpl/N; I know you are supposed to replace the N in the web page address with an integer, but I just copy and pasted it into firefox. When I did that, I was redirected to: http://hub.opensolaris.org/bin/view/Community+Group+zfs/N-1 That page is a list of four links labeled ZFS File System Version 1 through Version 4 But, following those links brings up the descriptions of the ZFS Pool versions 1-4, not the ZFS versions. Thanks again, Doug ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup existing data
On Wed, Dec 16, 2009 at 6:17 AM, Steven Sim unixan...@gmail.com wrote: r...@sunlight:/root# zfs send myplace/myd...@prededup | zfs receive -v myplace/mydata cannot receive new filesystem stream: destination 'myplace/fujitsu' exists must specify -F to overwrite it Try something like this: zfs create -o mountpoint=none myplace/dedup zfs unmount myplace/mydata # Make sure the source isn't changing anymore zfs snapshot myplace/myd...@prededup zfs send -R myplace/myd...@prededup | zfs receive -du myplace/dedup It'll create a new filesystem myplace/dedup/mydata zfs rename myplace/mydata myplace/mydata_old zfs rename myplace/dedup/mydata myplace/mydata zfs mount myplace/mydata You can now destroy the old dataset. I'm also adding a user property to the dedup'd copy so I don't accidentally do it again, eg: zfs set com.freaks:deduped=yes myplace/dedup/mydata prior to the 'zfs rename'. There's a little more finesse you can use to limit the time your source dataset is unmounted. Do a snapshot and send|receive to get most of the data over, then unmount and create a new snapshot and send|receive to catch any changes since the first. -B -- Brandon High : bh...@freaks.com Mistakes are often the stepping stones to utter failure. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup existing data
If you have another partition with enough space, you could technically just do: mv src /some/other/place mv /some/other/place src Anyone see a problem with that? Might be the best way to get it de-duped. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Dedupe reporting incorrect savings
Hi Giridhar, The size reported by ls can include things like holes in the file. What space usage does the zfs(1M) command report for the filesystem? Adam On Dec 16, 2009, at 10:33 PM, Giridhar K R wrote: Hi, Reposting as I have not gotten any response. Here is the issue. I created a zpool with 64k recordsize and enabled dedupe on it. --zpool create -O recordsize=64k TestPool device1 --zfs set dedup=on TestPool I copied files onto this pool over nfs from a windows client. Here is the output of zpool list -- zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT TestPool 696G 19.1G 677G 2% 1.13x ONLINE - I ran ls -l /TestPool and saw the total size reported as 51,193,782,290 bytes. The alloc size reported by zpool along with the DEDUP of 1.13x does not addup to 51,193,782,290 bytes. According to the DEDUP (Dedupe ratio) the amount of data copied is 21.58G (19.1G * 1.13) Here is the output from zdb -DD -- zdb -DD TestPool DDT-sha256-zap-duplicate: 33536 entries, size 272 on disk, 140 in core DDT-sha256-zap-unique: 278241 entries, size 274 on disk, 142 in core DDT histogram (aggregated over all DDTs): bucket allocated referenced __ __ __ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE -- -- - - - -- - - - 1 272K 17.0G 17.0G 17.0G 272K 17.0G 17.0G 17.0G 2 32.7K 2.05G 2.05G 2.05G 65.6K 4.10G 4.10G 4.10G 4 15 960K 960K 960K 71 4.44M 4.44M 4.44M 8 4 256K 256K 256K 53 3.31M 3.31M 3.31M 16 1 64K 64K 64K 16 1M 1M 1M 512 1 64K 64K 64K 854 53.4M 53.4M 53.4M 1K 1 64K 64K 64K 1.08K 69.1M 69.1M 69.1M 4K 1 64K 64K 64K 5.33K 341M 341M 341M Total 304K 19.0G 19.0G 19.0G 345K 21.5G 21.5G 21.5G dedup = 1.13, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.13 Am I missing something? Your inputs are much appritiated. Thanks, Giri -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl _ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss Thanks for the response Adam. Are you talking about ZFS list? It displays 19.6 as allocated space. What does ZFS treat as hole and how does it identify? Thanks, Giri -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DeDup and Compression - Reverse Order?
Your parenthetical comments here raise some concerns, or at least eyebrows, with me. Hopefully you can lower them again. compress, encrypt, checksum, dedup. (and you need to use zdb to get enough info to see the leak - and that means you have access to the raw devices) An attacker with access to the raw devices is the primary base threat model for on-disk encryption, surely? An attacker with access to disk traffic, via e.g. iSCSI, who can also deploy dynamic traffic analysis in addition to static content analysis, and who also has similarly greater opportunities for tampering, is another trickier threat model. It seems like entirely wrong thinking (even in parentheses) to dismiss an issue as irrelevant because it only applies in the primary threat model. (and the way I have implemented the IV generation for AES CCM/GCM mode ensures that the same plaintext will have the same IV so the ciphertexts will match). Again, this seems like a cause for concern. Have you effectively turned these fancy and carefully designed crypto modes back into ECB, albeit at a larger block size (and only within a dataset)? Let's consider copy-on-write semantics: with the above issue an attacker can tell which blocks of a file have changed over time, even if unchanged blocks have been rewritten.. giving even the static image attacker some traffic analysis capability. This would be a problem regardless of dedup, for the scenario where the attacker can see repeated ciphertext on disk (unless the dedup metadata itself is sufficiently encrypted, which I understand it is not). (you need to understand what I've done with the AES CCM/GCM MAC I'd like to, but more to understand what (if any) protection is given against replay attacks (above that already provided by the merkle hash tree). I await ZFS crypto with even more enthusiasm than dedup, thanks for talking about the details with us. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup existing data
On Thu, Dec 17, 2009 at 3:10 PM, Anil an...@entic.net wrote: If you have another partition with enough space, you could technically just do: mv src /some/other/place mv /some/other/place src Anyone see a problem with that? Might be the best way to get it de-duped. You'd lose any existing snapshots. You may lose ACLs. If you have snapshots of the source, the space will still be used until you destroy the snapshots. -B -- Brandon High : bh...@freaks.com Indecision is the key to flexibility. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Dedupe reporting incorrect savings
Thanks for the response Adam. Are you talking about ZFS list? It displays 19.6 as allocated space. What does ZFS treat as hole and how does it identify? ZFS will compress blocks of zeros down to nothing and treat them like sparse files. 19.6 is pretty close to your computed. Does your pool happen to be 10+1 RAID-Z? Adam -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs zend is very slow
It looks like the kernel is using a lot of memory, which may be part of the performance problem. The ARC has shrunk to 1G, and the kernel is using up over 5G. I'm doing a send|receive of 683G of data. I started it last night around 1am, and as of right now it's only sent 450GB. That's about 8.5MB/sec. Are there any other stats, or dtrace scripts I can look at to determine what's happening? bh...@basestar:~$ pfexec mdb -k Loading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp rootnex scsi_vhci zfs sata sd sockfs ip hook neti sctp arp usba fctl random crypto cpc fcip smbsrv nfs lofs ufs logindmux ptm sppp ipc ] ::memstat Page SummaryPagesMB %Tot Kernel1405991 5492 67% ZFS File Data 223137 871 11% Anon 396743 1549 19% Exec and libs1936 70% Page cache 5221200% Free (cachelist) 9181350% Free (freelist) 52685 2053% Total 2094894 8183 Physical 2094893 8183 bh...@basestar:~$ arcstat.pl 5 3 Time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c 16:05:33 204M6M 33M53M23M1 1G1G 16:05:38 562 101 1897 17 4 2397 17 1G1G 16:05:431K 709 39716 637 9479 15 1G1G -B -- Brandon High : bh...@freaks.com Always try to do things in chronological order; it's less confusing that way. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Dedupe reporting incorrect savings
I used the default while creating zpool with one disk drive. I guess it is a RAID 0 configuration. Thanks, Giri -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs zend is very slow
My ARC is ~3GB. I'm doing a test that copies 10GB of data to a volume where the blocks should dedupe 100% with existing data. First time, the test that runs 5MB sec, seems to average 10-30% ARC *miss* rate. 400 arc reads/sec. When things are working at disk bandwidth, I'm getting 3-5% ARC misses. Up to 7k arc reads/sec. If I do a recv on a small dataset, then immediately destroy replay the same thing, I can get in-core dedupe performance, and it's truly amazing. Does anyone know how big the dedupe tables are, and if they can be given some priority/prefetch in ARC? I think I have enough RAM to make this work. mike On Thu, Dec 17, 2009 at 4:12 PM, Brandon High bh...@freaks.com wrote: It looks like the kernel is using a lot of memory, which may be part of the performance problem. The ARC has shrunk to 1G, and the kernel is using up over 5G. I'm doing a send|receive of 683G of data. I started it last night around 1am, and as of right now it's only sent 450GB. That's about 8.5MB/sec. Are there any other stats, or dtrace scripts I can look at to determine what's happening? bh...@basestar:~$ pfexec mdb -k Loading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp rootnex scsi_vhci zfs sata sd sockfs ip hook neti sctp arp usba fctl random crypto cpc fcip smbsrv nfs lofs ufs logindmux ptm sppp ipc ] ::memstat Page SummaryPagesMB %Tot Kernel1405991 5492 67% ZFS File Data 223137 871 11% Anon 396743 1549 19% Exec and libs1936 70% Page cache 5221200% Free (cachelist) 9181350% Free (freelist) 52685 2053% Total 2094894 8183 Physical 2094893 8183 bh...@basestar:~$ arcstat.pl 5 3 Time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c 16:05:33 204M6M 33M53M23M1 1G1G 16:05:38 562 101 1897 17 4 2397 17 1G1G 16:05:431K 709 39716 637 9479 15 1G1G -B -- Brandon High : bh...@freaks.com Always try to do things in chronological order; it's less confusing that way. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, my console is 100% completely hung, not gonna be able to enter any commands when it freezes. I can't even get the numlock light to change it's status. This time I even plugged in a PS/2 keyboard instead of USB thinking maybe it was USB dying during the hang, but not so. I have hard rebooted my system again. I'm going to set up a script that will continuously run savecore, after 10, I'll reset the bounds file. Hopefully by doing it this way, I'll get a savecore right as the system starts to go unresponsive. I'll post the script I'll be running here shortly after I write it. Also, as far as using 'sync' Im not sure what exactly I would do there. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, this is the script I am running (as a background process). This script doesn't matter much, it's just here for reference, as I'm running into problems just running the savecore command while the zpool import is running. #!/bin/bash count=1 rm /var/crash/opensol/bounds /usr/bin/savecore -L while [ 1 ] do if [ $count == 10 ] then count=1 rm /var/crash/opensol/bounds fi savecore -L count=`expr $count + 1` done opensol was the name of the system before I renamed it to wd40, crash data is still set to be put in /var/crash/opensol I have started another zpool import of the vault volume r...@wd40:~# zpool import pool: vault id: 4018273146420816291 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: vault ONLINE raidz1-0 ONLINE c3d0ONLINE c3d1ONLINE c4d0ONLINE c4d1ONLINE r...@wd40:~# zpool import 4018273146420816291 [1] 1093 After starting the import, savecore -L no longer finishes r...@wd40:/var/adm# savecore -L dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel 0:05 100% done 100% done: 153601 pages dumped, dump succeeded It should be saying that it's saving to /var/crash/opensol/, but instead it just hangs and never returns me to a prompt Previous to running zpool import, the savecore command took anywhere from 10-15 seconds to finish. If I cd to /var/crash/opensol, there is not a new file created I tried firing off savecore again, same result. A ps listing shows the savecore command r...@wd40:/var/crash/opensol# ps -ef | grep savecore root 1092 1061 0 22:27:55 ? 0:01 savecore -L root 1134 1083 0 22:33:28 pts/3 0:00 grep savecore root 1113 787 0 22:30:23 ? 0:01 savecore -L (One of these is from the script I was running when I started the import manually, the other when I just ran the savecore -L command by itself). I cannot kill these processes, even with a kill -9 I then hard rebooted my server yet again (as it hangs if it's in process of a zpool import) After the reboot, all I did was ssh in, disable gdm, run my zfs import command, and try another savecore (this time not trying to use my script above first, just a simple savecore -L as root from the command line), once again it hangs r...@wd40:~# zpool import 4018273146420816291 [1] 783 r...@wd40:~# savecore -L dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel 0:05 100% done 100% done: 138876 pages dumped, dump succeeded -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] force 4k writes
On Wed, Dec 16 at 7:35, Bill Sprouse wrote: The question behind the question is, given the really bad things that can happen performance-wise with writes that are not 4k aligned when using flash devices, is there any way to insure that any and all writes from ZFS are 4k aligned? Some flash devices can handle this better than others, often several orders of magnitude better. Not all devices (as you imply) are so-affected. As a specific example of 2 devices with dramatically different performance for sub-4k transfers has anyone done any ZFS benchmarks between the X25E and the F20 they can share? I am particularly interested in zvol performance with a blocksize of 16k and highly compressible data (~10x). I am going to run some comparison tests but would appreciate any initial input on what to look out for or how to tune ZFS to get the most out of the F20. It might be helpful, e.g., if there where some where in the software stack where I could tell part of the system to lie and treat the F20 as a 4k device? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] force 4k writes
On Dec 17, 2009, at 9:04 PM, stuart anderson wrote: As a specific example of 2 devices with dramatically different performance for sub-4k transfers has anyone done any ZFS benchmarks between the X25E and the F20 they can share? I am particularly interested in zvol performance with a blocksize of 16k and highly compressible data (~10x). 16 KB recordsize? That seems a little unusual, what is the application? I am going to run some comparison tests but would appreciate any initial input on what to look out for or how to tune ZFS to get the most out of the F20. AFAICT, no tuning should be required. It is quite fast. It might be helpful, e.g., if there where some where in the software stack where I could tell part of the system to lie and treat the F20 as a 4k device? The F20 is rated at 84,000 random 4KB write IOPS. The DRAM write buffer will hide 4KB write effects. OTOH, the X-25E is rated at 3,300 random 4KB writes. It shouldn't take much armchair analysis to come to the conclusion that the F20 is likely to win that IOPS battle :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
On 18.12.09 07:13, Jack Kielsmeier wrote: Ok, my console is 100% completely hung, not gonna be able to enter any commands when it freezes. I can't even get the numlock light to change it's status. This time I even plugged in a PS/2 keyboard instead of USB thinking maybe it was USB dying during the hang, but not so. I have hard rebooted my system again. I think it may be better to boot system with kmdb loaded - you need to edit you GRUB menu OpenSolaris entry and add -k to kernel$ line. Or you can just load kmdb from the console: mdb -K then type :c to continue When system freezes, you can use F1-A key combination to drop into kmdb, and then you can type $systemdump to generate crashdump and reboot. Regards, victor I'm going to set up a script that will continuously run savecore, after 10, I'll reset the bounds file. Hopefully by doing it this way, I'll get a savecore right as the system starts to go unresponsive. I'll post the script I'll be running here shortly after I write it. Also, as far as using 'sync' Im not sure what exactly I would do there. -- -- Victor Latushkin phone: x11467 / +74959370467 TSC-Kernel EMEAmobile: +78957693012 Sun Services, Moscow blog: http://blogs.sun.com/vlatushkin Sun Microsystems ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss