Re: [zfs-discuss] SVM ZFS
On 02/26/13 20:30, Morris Hooten wrote: Besides copying data from /dev/md/dsk/x volume manager filesystems to new zfs filesystems does anyone know of any zfs conversion tools to make the conversion/migration from svm to zfs easier? With Solaris 11 you can use shadow migration, it is really a VFS layer feature but it is integrated into the ZFS CLI tools for easy of use # zfs create -o shadow=file:///path/to/old mypool/new The new filesystem will appear to instantly have all the data, and it will be copied over as it is access as well as shadowd pulling it over in advance. You can use shadowstat(1M) to show progress. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bp rewrite
On 02/15/13 14:39, Tyler Walter wrote: As someone who has zero insider information and feels that there isn't much push at oracle to develop or release new zfs features, I have to assume it's not coming. The only way I see it becoming a reality is if someone in the illumos community decides to do the work required to put it in. You obviously missed the thread we had recently about the new ZFS features that Solaris 11 and 11.1 have. ZFS is very much in active feature, bugfix and performance improvement at Oracle for current and future versions of Solaris and the ZFS Storage Appliance. BP rewrite is actually very complex to do correctly and safely - it it wasn't I'm sure it would have been done by now by multiple people! -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Freeing unused space in thin provisioned zvols
On 02/10/13 12:01, Koopmann, Jan-Peter wrote: Why should it? Unless you do a shrink on the vmdk and use a zfs variant with scsi unmap support (I believe currently only Nexenta but correct me if I am wrong) the blocks will not be freed, will they? Solaris 11.1 has ZFS with SCSI UNMAP support. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Freeing unused space in thin provisioned zvols
On 02/12/13 15:07, Thomas Nau wrote: Darren On 02/12/2013 11:25 AM, Darren J Moffat wrote: On 02/10/13 12:01, Koopmann, Jan-Peter wrote: Why should it? Unless you do a shrink on the vmdk and use a zfs variant with scsi unmap support (I believe currently only Nexenta but correct me if I am wrong) the blocks will not be freed, will they? Solaris 11.1 has ZFS with SCSI UNMAP support. Seem to have skipped that one... Are there any related tools e.g. to release all zero blocks or the like? Of course it's up to the admin then to know what all this is about or to wreck the data No tools, ZFS does it automaticaly when freeing blocks when the underlying device advertises the functionality. ZFS ZVOLs shared over COMSTAR advertise SCSI UNMAP as well. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
On 01/24/13 00:04, Matthew Ahrens wrote: On Tue, Jan 22, 2013 at 5:29 AM, Darren J Moffat darr...@opensolaris.org mailto:darr...@opensolaris.org wrote: Preallocated ZVOLs - for swap/dump. Darren, good to hear about the cool stuff in S11. Just to clarify, is this preallocated ZVOL different than the preallocated dump which has been there for quite some time (and is in Illumos)? Can you use it for other zvols besides swap and dump? It is the same but we are using it for swap now too. It isn't available for general use. Some background: the zfs dump device has always been preallocated (thick provisioned), so that we can reliably dump. By definition, something has gone horribly wrong when we are dumping, so this code path needs to be as small as possible to have any hope of getting a dump. So we preallocate the space for dump, and store a simple linked list of disk segments where it will be stored. The dump device is not COW, checksummed, deduped, compressed, etc. by ZFS. For the sake of others, I know you know this Matt, the dump system does the compression so ZFS didn't need to anyway. In Illumos (and S10), swap was treated more or less like a regular zvol. This leads to some tricky code paths because ZFS allocates memory from many points in the code as it is writing out changes. I could see advantages to the simplicity of a preallocated swap volume, using the same code that already existed for preallocated dump. Of course, the loss of checksumming and encryption is much more of a concern with swap (which is critical for correct behavior) than with dump (which is nice to have for debugging). We have encryption for dump because it is hooked in to the zvol code. For encrypting swap Illumos could do the same as Solaris 11 does and use lofi. I changed swapadd so that if encryption is specified in the options field of the vfstab entry it creates a lofi shim over the swap device using 'lofiadm -e'. This provides you encrypted swap regardless of what the underlying disk is (normal ZVOL, prealloc ZVOL, real disk slide, SVM mirror etc). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
On 01/21/13 17:03, Sašo Kiselkov wrote: Again, what significant features did they add besides encryption? I'm not saying they didn't, I'm just not aware of that many. Just a few examples: Solaris ZFS already has support for 1MB block size. Support for SCSI UNMAP - both issuing it and honoring it when it is the backing store of an iSCSI target. It also has a lot of performance improvements and general bug fixes in the Solaris 11.1 release. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
On 01/22/13 11:57, Tomas Forsman wrote: On 22 January, 2013 - Darren J Moffat sent me these 0,6K bytes: On 01/21/13 17:03, Sa?o Kiselkov wrote: Again, what significant features did they add besides encryption? I'm not saying they didn't, I'm just not aware of that many. Just a few examples: Solaris ZFS already has support for 1MB block size. Support for SCSI UNMAP - both issuing it and honoring it when it is the backing store of an iSCSI target. Would this apply to say a SATA SSD used as ZIL? (which we have, a vertex2ex with supercap) If the device advertises the UNMAP feature and you are running Solaris 11.1 it should attempt to use it. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
On 01/22/13 13:20, Michel Jansens wrote: Maybe 'shadow migration' ? (eg: zfs create -o shadow=nfs://server/dir pool/newfs) That isn't really a ZFS feature, since it happens at the VFS layer. The ZFS support there is really about getting the options passed through and checking status but the core of the work happens at the VFS layer. Shadow migration works with UFS as well! Since I'm replying here are a few others that have been introduced in Solaris 11 or 11.1. There is also the new improved ZFS share syntax for NFS and CIFS in Solaris 11.1 where you can much more easily inherit and also override individual share properties. There is improved diganostics rules. ZFS support for Immutable Zones (mostly a VFS feature) Extended (privilege) Policy and aliasing of datasets in Zones (so you don't see the part of the dataset hierarchy above the bit delegated to the zone). UEFI GPT label support for root pools with GRUB2 and on SPARC with OBP. New sensitive per file flag. Various ZIL and ARC performance improvements. Preallocated ZVOLs - for swap/dump. Michel On 01/21/13 17:03, Sašo Kiselkov wrote: Again, what significant features did they add besides encryption? I'm not saying they didn't, I'm just not aware of that many. Just a few examples: Solaris ZFS already has support for 1MB block size. Support for SCSI UNMAP - both issuing it and honoring it when it is the backing store of an iSCSI target. It also has a lot of performance improvements and general bug fixes in the Solaris 11.1 release. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Michel Jansens mjans...@ulb.ac.be ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
On 01/22/13 13:29, Sašo Kiselkov wrote: On 01/22/2013 02:20 PM, Michel Jansens wrote: Maybe 'shadow migration' ? (eg: zfs create -o shadow=nfs://server/dir pool/newfs) Hm, interesting, so it works as a sort of replication system, except that the data needs to be read-only and you can start accessing it on the target before the initial sync. Did I get that right? The source filesystem needs to be read-only. It works at the VFS layer so it doesn't copy snapshots or clones over. Once mounted it appears like all the original data is instantly there. There is an (optional) shadowd that pushes the migration along, but it will complete on its own anyway. shadowstat(1M) gives information on the status of the migrations. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
On 01/22/13 13:29, Darren J Moffat wrote: Since I'm replying here are a few others that have been introduced in Solaris 11 or 11.1. and another one I can't believe I missed since I was one of the people that helped design it and I did codereview... Per file sensitively labels for TX configurations. and I'm sure I'm still missing stuff that is in Solaris 11 and 11.1. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
On 01/22/13 15:32, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: From: Darren J Moffat [mailto:darr...@opensolaris.org] Support for SCSI UNMAP - both issuing it and honoring it when it is the backing store of an iSCSI target. When I search for scsi unmap, I come up with all sorts of documentation that ... is ... like reading a medical journal when all you want to know is the conversion from 98.6F to C. Would you mind momentarily, describing what SCSI UNMAP is used for? If I were describing to a customer (CEO, CFO) I'm not going to tell them about SCSI UNMAP, I'm going to say the new system has a new feature that enables ... or solves the ___ problem... Customer doesn't *necessarily* have to be as clueless as CEO/CFO. Perhaps just another IT person, or whatever. It is a mechanism for part of the storage system above the disk (eg ZFS) to inform the disk that it is no longer using a given set of blocks. This is useful when using an SSD - see Saso's excellent response on that. However it can also be very useful when your disk is an iSCSI LUN. It allows the filesystem layer (eg ZFS or NTFS, etc) when on iSCSI LUN that advertises SCSI UNMAP to tell the target there are blocks in that LUN it isn't using any more (eg it just deleted some blocks). This means you can get more accurate space usage when using things like iSCSI. ZFS in Solaris 11.1 issues SCSI UNMAP to devices that support it and the ZVOLs when exported over COMSTAR advertise it too. In the iSCSI case it is mostly about improved space accounting and utilisation. This is particularly interesting with ZFS when snapshots and clones of ZVOLs come into play. Some vendors call this (and thins like it) Thin Provisioning, I'd say it is more accurate communication between 'disk' and filesystem about in use blocks. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
On 01/22/13 16:02, Sašo Kiselkov wrote: On 01/22/2013 05:00 PM, casper@oracle.com wrote: Some vendors call this (and thins like it) Thin Provisioning, I'd say it is more accurate communication between 'disk' and filesystem about in use blocks. In some cases, users of disks are charged by bytes in use; when not using SCSI UNMAP, a set of disks used for a zpool will in the end be charged for the whole reservation; this becomes costly when your standard usage is much less than your peak usage. Thin provisioning can now be used for zpools as long as the underlying LUNs have support for SCSI UNMAP Looks like an interesting technical solution to a political problem :D There is also a technical problem too: because if you can't inform the backing store that you no longer need the blocks it can't free them either so they get stuck in snapshots unnecessarily. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dm-crypt + ZFS on Linux
On 11/23/12 15:49, John Baxter wrote: After searching for dm-crypt and ZFS on Linux and finding too little information, I shall ask here. Please keep in mind this in the context of running this in a production environment. We have the need to encypt our data, approximately 30TB on three ZFS volumes under Solaris 10. The volumes currently reside on iscsi sans connected via 10Gb/s ethernet. We have tested Solaris 11 with ZFS encrypted volumes and found the performance to be very poor and have an open bug report with Oracle. This bug report hasn't reached me yet and I'd really like to be sure if there is a performance bug with ZFS that is unique to encryption I can attempt to resolve it. Can you please provide the bug and/or SR number that Oracle Support gave to you. We are a Linux shop and since performance is so poor and still no resolution, we are considering ZFS on Linux with dm-crypt. I have read once or twice that if we implemented ZFS + dm-crypt we would loose features, however which features are not specified. We currently mirror the volumes across identical iscsi sans with ZFS and we use hourly ZFS snapshots to update our DR site. Which features of ZFS are lost if we use dm-crypt? My guess would be they are related to raidz but unsure. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dm-crypt + ZFS on Linux
On 11/30/12 11:41, Darren J Moffat wrote: On 11/23/12 15:49, John Baxter wrote: After searching for dm-crypt and ZFS on Linux and finding too little information, I shall ask here. Please keep in mind this in the context of running this in a production environment. We have the need to encypt our data, approximately 30TB on three ZFS volumes under Solaris 10. The volumes currently reside on iscsi sans connected via 10Gb/s ethernet. We have tested Solaris 11 with ZFS encrypted volumes and found the performance to be very poor and have an open bug report with Oracle. This bug report hasn't reached me yet and I'd really like to be sure if there is a performance bug with ZFS that is unique to encryption I can attempt to resolve it. Can you please provide the bug and/or SR number that Oracle Support gave to you. For the sake of those on the list, I've got these references now. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Appliance as a general-purpose server question
On 11/22/12 16:24, Jim Klimov wrote: A customer is looking to replace or augment their Sun Thumper with a ZFS appliance like 7320. However, the Thumper was used not only as a protocol storage server (home dirs, files, backups over NFS/CIFS/Rsync), but also as a general-purpose server with unpredictably-big-data programs running directly on it (such as corporate databases, Alfresco for intellectual document storage, etc.) in order to avoid the networking transfer of such data between pure-storage and compute nodes - this networking was seen as both a bottleneck and a possible point of failure. Is it possible to use the ZFS Storage appliances in a similar way, and fire up a Solaris zone (or a few) directly on the box for general-purpose software; or to shell-script administrative tasks such as the backup archive management in the global zone (if that concept still applies) as is done on their current Solaris-based box? No it is a true appliance, it might look like it has Solaris underneath but it is just based on Solaris. You can script administrative tasks but not using bash/ksh style scripting you use the ZFSSA's own scripting language. Is it possible to run VirtualBoxes in the ZFS-SA OS, dare I ask? ;) No. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to older version
On 10/24/12 03:16, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Karl Wagner The only thing I think Oracle should have done differently is to allow either a downgrade or creating a send stream in a lower version (reformatting the data where necessary, and disabling features which weren't present). However, this would not be a simple addition, and it is probably not worth it for Oracle's intended customers. So you have a backup server in production, that has storage and does a zfs send to removable media, on periodic basis. (I know I do.) So you buy a new server, and it comes with a new version of zfs. Now you can't backup your new server. So in this case you should have a) created the pool with a version that matches the pool version of the backup server and b) make sure you create the ZFS file systems with a version that is supposed by the backup server. zpool create -o version= zfs create -o version= ZFS has the functionality but it can't guess as to what the intended usage is so the default behaviour is to create pools and versions using the highest version supported by the running software. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to older version
On 10/24/12 17:44, Carson Gaspar wrote: On 10/24/12 3:59 AM, Darren J Moffat wrote: So in this case you should have a) created the pool with a version that matches the pool version of the backup server and b) make sure you create the ZFS file systems with a version that is supposed by the backup server. And AI allows you to set the rpool version how, exactly? I haven't personally tried this but I believe it should be possible since you can set other pool options at install time eg: pool_options option name=version value=28 / pool_options similarly for datasets that your AI manifest creates for you: dataset_options option name=version value=4 / dataset_options See /usr/share/install/target.dtd.1 -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] openindiana-1 filesystem, time-slider, and snapshots
On 10/16/12 14:54, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: Can anyone explain to me what the openindiana-1 filesystem is all about?I thought it was the backup copy of the openindiana filesystem, when you apply OS updates, but that doesn't seem to be the case... I have time-slider enabled for rpool/ROOT/openindiana.It has a daily snapshot (amongst others).But every day when the new daily snap is taken, the old daily snap rotates into the rpool/ROOT/openindiana-1 filesystem.This is messing up my cron-scheduled zfs send script - which detects that the rpool/ROOT/openindiana filesystem no longer has the old daily snapshot, and therefore has no snapshot in common with the receiving system, and therefore sends a new full backup every night. To make matters more confusing, when I run mount and when I zfs get all | grep -i mount, I see / on rpool/ROOT/openindiana-1 It is a new boot environment see beadm(1M) - you must have done some 'pkg update' or 'pkg install' option that created a new BE. It would seem, I shouldn't be backing up openindiana, but instead, backup openindiana-1?I would have sworn, out-of-the-box, there was no openindiana-1.Am I simply wrong? Initially there wouldn't have been. Are you doing the zfs send on your own or letting time-slider do it for you ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ok for single disk dev box?
On 08/30/12 11:07, Anonymous wrote: Hi. I have a spare off the shelf consumer PC and was thinking about loading Solaris on it for a development box since I use Studio @work and like it better than gcc. I was thinking maybe it isn't so smart to use ZFS since it has only one drive. If ZFS detects something bad it might kernel panic and lose the whole system right? I realize UFS /might/ be ignorant of any corruption but it might be more usable and go happily on it's way without noticing? Except then I have to size all the partitions and lose out on compression etc. Any suggestions thankfully received. If you are using Solaris 11 or any of the Illumos based distributions you have not choice you must use ZFS as your root/boot filesystem. I would recommend that if physically possible attach a second drive to make it a mirror. Personally I've run many many builds of Solaris on single disk laptop systems and never has it lost me access to my data. The only time I lost access to data on a single disk system was because of total hard drive failure. I run with copies=2 set on my home directory and any datasets I store data in when on a single disk system. However much much more importantly ZFS does not preclude the need for off system backups. Even with mirroring, and snaphots you still have to have a backup of important data elsewhere. No file system and more importantly no hardware is that good. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New fast hash algorithm - is it needed?
On 07/11/12 00:56, Sašo Kiselkov wrote: * SHA-512: simplest to implement (since the code is already in the kernel) and provides a modest performance boost of around 60%. FIPS 180-4 introduces SHA-512/t support and explicitly SHA-512/256. http://csrc.nist.gov/publications/fips/fips180-4/fips-180-4.pdf Note this is NOT a simple truncation of SHA-512 since when using SHA-512/t the initial value H(0) is different. See sections 5.3.6.2 and 6.7. I recommend the checksum value for this be checksum=sha512/256 A / in the value doesn't cause any problems and it is the official NIST name of that hash. With the internal enum being: ZIO_CHECKSUM_SHA512_256 CR 7020616 already exists for adding this in Oracle Solaris. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Benefits of enabling compression in ZFS for the zones
On 07/10/12 12:45, Ferenc-Levente Juhos wrote: Of course you don't see any difference, this is how it should work. 'ls' will never report the compressed size, because it's not aware of it. Nothing is aware of the compression and decompression that takes place on-the-fly, except of course zfs. That's the reason why you could gain in write and read speed if you use compression, because the actual amount of compressed data that is being written and read from the pool is smaller than the original data. And I think with the checksum test you prooved that zfs checksums the uncompressed data. No ZFS checksums are over the data as it is stored on disk so the compressed data. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] current status of SAM-QFS?
On 05/02/12 23:34, Fred Liu wrote: If you want to know Oracle's roadmap for SAM-QFS then I recommend contacting your Oracle account rep rather than asking on a ZFS discussion list. You won't get SAM-QFS or Oracle roadmap answers from this alias. My original purpose is to ask if there is an effort to integrate open-sourced SAM-QFS into illumos or smartos/oi/illumian. Okay, then it would have been clearer if you had asked that question but you asked about SAM-QFS on a zfs discuss alias. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] current status of SAM-QFS?
On 05/02/12 10:40, Fred Liu wrote: Still a fully supported product from Oracle: http://www.oracle.com/us/products/servers-storage/storage/storage- software/qfs-software/overview/index.html Yeah. But it seems no more updates since sun acquisition. Don't know Oracle's roadmap in aspect of data-tying. If you want to know Oracle's roadmap for SAM-QFS then I recommend contacting your Oracle account rep rather than asking on a ZFS discussion list. You won't get SAM-QFS or Oracle roadmap answers from this alias. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] current status of SAM-QFS?
On 04/30/12 04:00, Fred Liu wrote: The subject says it all. Still a fully supported product from Oracle: http://www.oracle.com/us/products/servers-storage/storage/storage-software/qfs-software/overview/index.html -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Aaron Toponce: Install ZFS on Debian GNU/Linux
On 04/18/12 17:28, Jim Klimov wrote: In the beginning it was my wishful thinking that encryption code and maybe some other newbies got legally leaked into Linux, and if they were there, then they might be legally included into other ZFS source code projects. Not Linux per say but there is another (readonly) implementation of ZFS encryption: http://bazaar.launchpad.net/~vcs-imports/grub/grub2-bzr/view/head:/grub-core/fs/zfs/zfscrypt.c -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris 11/ZFS historical reporting
On 04/16/12 20:18, Anh Quach wrote: Are there any tools that ship w/ Solaris 11 for historical reporting on things like network activity, zpool iops/bandwidth, etc., or is it pretty much roll-your-own scripts and whatnot? For network activity look at flowstat it can read exacct format files. For IO depends what level you want to look at, if it is the device level iostat, if it is how ZFS is using the devices look at 'zpool iostat'. If it is the filesystem level look at fsstat. Also look acctadm(1M). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss Digest, Vol 76, Issue 20
On 02/21/12 15:32, zfs-dev wrote: You might want to try a reboot of the system. There is some low level caching of the encryption key in the kernel. I noticed that you can remove the key and continue to mount and umount it without a key so long as you do not reboot. Maybe this will clear it up. I never recommend just reboot however, in this case it may actually work. That behaviour is by design and is documented on zfs(1M) in the 'zfs umount' section as follows: For an encrypted dataset, the key is not unloaded when the file system is unmounted. To unload the key, see zfs key. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot mount encrypted filesystems.
On 02/22/12 06:10, Roberto Waltman wrote: 2011-08-23.23:48:35 zfs set keysource=passphrase,file:///root/passphrases/slice_2_passphrase slice_2/base/bitsavers That should have failed because the keysource property is inherited from slice_2/base. So you have found a bug and I can reproduce it. The reason that should have failed is the source of where the keysource comes from is used to determine which dataset to look at for the hidden salt property. We know what that salt property should actually be in your case because it is set on slice_2/base. Unfortunately 'zfs set salt' won't work because salt is read-only from userland (so it doesn't accidentally get overridden and cause the very same symptoms you have!). In theory you would assume that you could go back to having the keysource inherited by running: 'zfs inherit keysource slice_2/base/bitsavers' However that won't work because of a protection we have in place to again avoid yet another route into these same symptoms. It will fail with an error message something like this: cannot inherit keysource for 'slice_2/base/bitsavers': use 'zfs key -c -o keysource=...' Using a hacked up libzfs that removes the check that 'zfs inherit' does so I can get out of the situation and make the datasets accessible again. So this is fixable so don't abandon hope yet. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot mount encrypted filesystems.
On 02/21/12 01:58, Roberto Waltman wrote: First, I did the 2nd. (Change location only) I believe I tried the first form also *after* things were already broken, but I'm sure the passphrases were identical: slice_08, slice_18 and slice_28 for each pools 0/1/2. - The '8' to bring the length to the minimal requirement of 8 characters. A 'zfs key -c' won't work unless a 'zfs key -l' or 'zfs mount' has successfully loaded the key first. Can you send the 'zpool history slice_2' output so I can see what commands have been run. ( My goal for using encryption was just to obfuscate the contents if, for example, I send a disk out for repair; not to hide anything from the NSA ) Question: I believed the keys generated from a passphrase depend only on the passphrase, and not on how it is provided or where it is stored. Is this a true statement? Almost, the passphrase case also depends on a hidden property called salt that is updated only when you do 'zfs key -c' and was set to a random value at the time the dataset was created. Did you ever do a send|recv of these filesystems ? There was a bug with send|recv in 151a that has since been fixed that could cause the salt to be zero'd out in some cases. slice_2/base/bitsavers keysource passphrase,file:///export/home/trouser/passphrases/slice_2_passphrase local This is the interesting part you have set the keysource explicitly on every leaf dataset - you didn't need to do that it would have been inherited. What this means is that even though you have the same passphrase for each dataset the actual data encryption key is different because the passphrase value plus the hidden salt property are used together to generated the wrapping key. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] encryption
On 02/21/12 13:27, Edward Ned Harvey wrote: From: Darren J Moffat [mailto:darr...@opensolaris.org] Sent: Monday, February 20, 2012 12:46 PM GRUB2 has support for encrypted ZFS file systems already. I assume this requires a pre-boot password, right? Then I have two questions... The ZFS encryption support in GRUB2 was written by the main GRUB2 developer and doesn't use any Solaris ZFS encryption code. The GRUB2 code has support for interactive prompting for the passphrase or for reading the passphrase or raw wrapping key from a file in some other filesystem that GRUB2 can see. Solaris 11 doesn't have GRUB2 at this time it uses GRUB 0.97 which does not have encryption support. You can't put the two parts together because the Solaris 11 kernel doesn't know how to mount an encrypted root filesystem even though GRUB2 could have loaded the kernel and boot_archive from one if you managed to craft together a GRUB2 and Solaris 11 system on your own. I noticed in solaris 11, when you init 6 it doesn't reboot the way other OSes reboot. What you are seeing is Fast Reboot where on x86 we completely avoid the trip back through the BIOS and the boot loader it just loads and rexecute the kernel directly. The situation on SPARC is similar but not identical. So maybe init 6 will not need you to type in a password again? Maybe you just need a passsword one time when you power on? Solaris 11 doesn't have support for encrypted root at all at this time. Doesn't mater if Fast Reboot is in use or not. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] encryption
On 02/16/12 15:35, David Magda wrote: On Thu, February 16, 2012 09:55, Edward Ned Harvey wrote: I've never used ZFS encryption. How does it work? Do you need to type in a pre-boot password? And if so, how do you do that with a server? Or does it use TPM or something similar, to avoid the need for a pre-boot password? Darren Moffat put up some good posts when the code was initially introduced: https://blogs.oracle.com/darren/en_GB/tags/zfs https://blogs.oracle.com/darren/en_GB/tags/crypto I don't believe encrypting the root volume is currently supported, so pre-boot stuff doesn't apply. (Please correct if I'm wrong here.) That is correct you can't currently encrypt the root/boot file system. This is because neither OBP or GRUB 0.97 have any knowledge of ZFS encrypted file systems and how to get keys for them. GRUB2 has support for encrypted ZFS file systems already. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot mount encrypted filesystems.
On 02/18/12 05:12, Roberto Waltman wrote: Solaris 11 Express 1010.11/snv_151a I strongly suggest upgrading to Solaris 11 there have been some important ZFS and specifically ZFS encryption related bug fixes. They were created with encryption on, forcing all others to be encrypted. The keysource for slice_?/base was set to passphrase,prompt while creating the file systems. Then I stored the keys (one key per pool) in files in a subdirectory of home/user1, and set keysource for slice_0/base to passphrase,file:///export/home/user1/keys/key_0 (Similarly for the other two pools) Did you ever export the slice_0 pool and reimport it or reboot the server ? Basically are you and ZFS both 100% sure you had the correct passphrases stored in those files ? So far so good. Several weeks and several terabytes of data later, I decided to relocate the files with the encryption keys from a subdir of user1 to a subdir of root. Copied the files and set slice_0/base keysource to passphrase,file:///root/keys/key_0, etc. Exactly how did you do that ? zfs key -c -o keysource=passphrase,file:///root/keys/key_0 or zfs set keysource=passphrase,file:///root/keys/key_0 The first does a key change and actually reencryptes the on disk data encryption keys using the newly generated AES wrapping key that is derived from the passphrase. The second only change where to find the passphrase. That broke it. After doing that, the base file systems (that contain no data files) can be mounted, but trying to mount any other fs fails with the message: cannot load key for 'slice_?/base/fsys_?_?': incorrect key. Can post some sample output of: zfs get -r encryption,keysource slice_0 In particular include a few examples of the filesystems you call 'base' and the fsys ones. What is important here is understanding where the encryption and keysource properties are set and where they are inherited. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: add an option/attribute to import ZFS pool without automounting/sharing ZFS datasets
On 01/11/12 11:48, Jim Klimov wrote: I think about adding the following RFE to illumos bugtracker: add an option/attribute to import ZFS pool without automounting/sharing ZFS datasets I wonder if something like this (like a tricky workaround) is already in place? -N Import the pool without mounting any file systems. If it isn't mounted it can't be shared. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL on a dedicated HDD slice (1-2 disk systems)
On 01/08/12 18:21, Bob Friesenhahn wrote: Something else to be aware of is that even if you don't have a dedicated ZIL device, zfs will create a ZIL using devices in the main pool so Terminology nit: The log device is a SLOG. Every ZFS dataset has a ZIL. Where the ZIL writes (slog or main pool devices) go for a given dataset are determined by a combination of things including (but not limited to) the presence of a SLOG device, the logbias property and the size of the data. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] S11 vs illumos zfs compatiblity
On 12/28/11 06:27, Richard Elling wrote: On Dec 27, 2011, at 7:46 PM, Tim Cook wrote: On Tue, Dec 27, 2011 at 9:34 PM, Nico Williamsn...@cryptonector.com wrote: On Tue, Dec 27, 2011 at 8:44 PM, Frank Cusackfr...@linetwo.net wrote: So with a de facto fork (illumos) now in place, is it possible that two zpools will report the same version yet be incompatible across implementations? This was already broken by Sun/Oracle when the deduplication feature was not backported to Solaris 10. If you are running Solaris 10, then zpool version 29 features are not implemented. Solaris 10 does have some deduplication support, it can import and read datasets in a deduped pool just fine. You can't enable dedup on a dataset and any writes won't dedup they will rehydrate. So it is more like partial dedup support rather than it not being there at all. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can I create a mirror for a root rpool?
On 12/18/11 11:52, Pawel Jakub Dawidek wrote: On Thu, Dec 15, 2011 at 04:39:07PM -0700, Cindy Swearingen wrote: Hi Anon, The disk that you attach to the root pool will need an SMI label and a slice 0. The syntax to attach a disk to create a mirrored root pool is like this, for example: # zpool attach rpool c1t0d0s0 c1t1d0s0 BTW. Can you, Cindy, or someone else reveal why one cannot boot from RAIDZ on Solaris? Is this because Solaris is using GRUB and RAIDZ code would have to be licensed under GPL as the rest of the boot code? I'm asking, because I see no technical problems with this functionality. Booting off of RAIDZ (even RAIDZ3) and also from multi-top-level-vdev pools works just fine on FreeBSD for a long time now. Not being forced to have dedicated pool just for the root if you happen to have more than two disks in you box is very convenient. For those of us not familiar with how FreeBSD is installed and boots can you explain how boot works (ie do you use GRUB at all and if so which version and where the early boot ZFS code is). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup
On 12/07/11 20:48, Mertol Ozyoney wrote: Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware. The only vendor i know that can do this is Netapp In fact , most of our functions, like replication is not dedup aware. For example, thecnicaly it's possible to optimize our replication that it does not send daya chunks if a data chunk with the same chechsum exists in target, without enabling dedup on target and source. We already do that with 'zfs send -D': -D Perform dedup processing on the stream. Deduplicated streams cannot be received on systems that do not support the stream deduplication feature. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on Dell with FreeBSD
On 10/19/11 15:30, Fajar A. Nugraha wrote: On Wed, Oct 19, 2011 at 9:14 PM, Albert Shihalbert.s...@obspm.fr wrote: Hi Sorry to cross-posting. I don't knwon which mailing-list I should post this message. I'll would like to use FreeBSD with ZFS on some Dell server with some MD1200 (classique DAS). When we buy a MD1200 we need a RAID PERC H800 card on the server so we have two options : 1/ create a LV on the PERC H800 so the server see one volume and put the zpool on this unique volume and let the hardware manage the raid. 2/ create 12 LV on the perc H800 (so without raid) and let FreeBSD and ZFS manage the raid. which one is the best solution ? Neither. The best solution is to find a controller which can pass the disk as JBOD (not encapsulated as virtual disk). Failing that, I'd go with (1) (though others might disagree). No go with 2. ALWAYS let ZFS manage the redundancy otherwise it can't self-heal. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about btrfs and zfs
On 10/18/11 13:18, Edward Ned Harvey wrote: * btrfs is able to balance. (after adding new blank devices, rebalance, so the data workload are distributed across all the devices.) zfs is not able to do this yet. ZFS does slightly biases new vdevs for new writes so that we will get to a more even spread. It doesn't go and move already written blocks onto the new vdevs though. So while there isn't an admin interface to rebalancing ZFS does do something in this area. This is implemented in metaslab_alloc_dva() http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c See lines 1356-1378 -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about btrfs and zfs
On 10/18/11 14:04, Jim Klimov wrote: 2011-10-18 16:26, Darren J Moffat пишет: On 10/18/11 13:18, Edward Ned Harvey wrote: * btrfs is able to balance. (after adding new blank devices, rebalance, so the data workload are distributed across all the devices.) zfs is not able to do this yet. ZFS does slightly biases new vdevs for new writes so that we will get to a more even spread. It doesn't go and move already written blocks onto the new vdevs though. So while there isn't an admin interface to rebalancing ZFS does do something in this area. This is implemented in metaslab_alloc_dva() http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c See lines 1356-1378 And the admin interface would be what exactly?.. As I said there isn't one because that isn't how it works today it is all automatic and only for new writes. I was pointing out that ZFS does do 'something' not that it had an exactly matching feature. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper (X4500), and CF SSD for L2ARC = ?
On 10/14/11 13:39, Jim Klimov wrote: Hello, I was asked if the CF port in Thumpers can be accessed by the OS? In particular, would it be a good idea to use a modern 600x CF card (some reliable one intended for professional photography) as an L2ARC device using this port? I don't know about the Thumpers internal CF slot. I can say I have tried using a fast (at the time, this was about 3 years ago) CF card via a CF to IDE adaptor before and it turned out to be a really bad idea because the spinning rust disk (which was SATA) was actually faster to access. Same went for USB to CF adaptors at the time too. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] commercial zfs-based storage replication software?
On 10/13/11 09:27, Fajar A. Nugraha wrote: On Tue, Oct 11, 2011 at 5:26 PM, Darren J Moffat darr...@opensolaris.org wrote: Have you looked at the time-slider functionality that is already in Solaris ? Hi Darren. Is it available for Solaris 10? I just installed Solaris 10 u10 and couldn't find it. No it is not. There is a GUI for configuration of the snapshots the screenshots that I can find all refer to opensolaris and time-slider can be configured to do a 'zfs send' or 'rsync'. The GUI doesn't have the ability to set the 'zfs recv' command but that is set one-time in the SMF service properties. Is there a reference on how to get/install this functionality on Solaris 10? No because it doesn't exist on Solaris 10. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] commercial zfs-based storage replication software?
Have you looked at the time-slider functionality that is already in Solaris ? There is a GUI for configuration of the snapshots and time-slider can be configured to do a 'zfs send' or 'rsync'. The GUI doesn't have the ability to set the 'zfs recv' command but that is set one-time in the SMF service properties. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any info about System attributes
On 09/26/11 20:03, Jesus Cea wrote: # zpool upgrade -v [...] 24 System attributes [...] This is really an on disk format issue rather than something that the end user or admin can use directly. These are special on disk blocks for storing file system metadata attributes when there isn't enough space in the bonus buffer area of the on disk version of the dnode. This can be necessary in some cases if a file has a very large and complex ACL and also has other attributes set such as the ones for CIFS compatibility. They are also always used if the filesystem is encrypted, so that all metadata is in the system attribute (also know as spill) block rather than in the dnode - this is required because we need the dnone in the clear because it contains block pointers and other information needed to navigate the pool. However we never want file system metadata to be in the clear. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Advice with SSD, ZIL and L2ARC
On 09/19/11 18:45, Jesus Cea wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have a new answer: interaction between dataset encryption and L2ARC and ZIL. 1. I am pretty sure (but not completely sure) that data stored in the ZIL is encrypted, if the destination dataset uses encryption. Can anybody confirm?. Of course if we didn't do that we would be leaking user data. 2. What happens with L2ARC?. Since ARC is not encrypted (in RAM), is it encrypted when evicted to L2ARC?. Use of the L2ARC is disabled for data from encrypted datasets at this time. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Advice with SSD, ZIL and L2ARC
On 08/30/11 15:31, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Jesus Cea 1. Is the L2ARC data stored in the SSD checksummed?. If so, can I expect that ZFS goes directly to the disk if the checksum is wrong?. Yup. Note the following is an implementation detail subject to change: It is NOT checksumed on disk only in memory, but the L2ARC data on disk is not used after reboot anyway just now. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RBAC and zfs
On 08/26/11 13:29, cephas maposah wrote: i would like to create a role which can take snapshots, run zfs send and zfs receive. the user switches to that role and has permissions to run those commands on a pool See the zfs(1M) man page for the section on the 'allow' subcommand. Assuming a role name of 'myrole' and a ZFS pool called 'tank' it would be something like this: # roleadd -R myrole # passwd myrole ... # useradd -R myrole cephas # zfs allow -u myrole send,receive,snapshot,mount tank -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disable ZIL - persistent
On 08/05/11 13:11, Edward Ned Harvey wrote: After a certain rev, I know you can set the sync property, and it takes effect immediately, and it's persistent across reboots. But that doesn't apply to Solaris 10. My question: Is there any way to make Disabled ZIL a normal mode of operations in solaris 10? Particularly: If I do this echo zil_disable/W0t1 | mdb -kw then I have to remount the filesystem. It's kind of difficult to do this automatically at boot time, and impossible (as far as I know) for rpool. The only solution I see is to write some startup script which applies it to filesystems other than rpool. Which feels kludgy. Is there a better way? echo set zfs:zil_disable = 1 /etc/system -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD vs hybrid drive - any advice?
On 07/27/11 00:00, Peter Jeremy wrote: On 2011-Jul-26 17:24:05 +0800, Fajar A. Nugrahaw...@fajar.net wrote: Shouldn't modern SSD controllers be smart enough already that they know: - if there's a request to overwrite a sector, then the old data on that sector is no longer needed ZFS never does update-in-place and UFS only does update-in-place for Note quite never, there are some very special cases where blocks are allocated ahead of time and could be written to in place more than once. In particular the special type of ZVOLs used for dump devices. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
On 07/27/11 10:24, Fred Liu wrote: The alternative is to have the node in your NDMP network that does the writing to the tape to do the compression and encryption of the data stream before putting it on the tape. I see. T1C is a monster to have if possible ;-). And doing the job on NDMP node(Solaris) needs extra software, is it correct? I believe so, also it is more than just the T1C drive you need it needs to be in a library and you also need the Oracle Key Management system to be able to do the key management for it. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
On 07/27/11 12:51, Pawel Jakub Dawidek wrote: On Tue, Jul 26, 2011 at 03:28:10AM -0700, Fred Liu wrote: The ZFS Send stream is at the DMU layer at this layer the data is uncompress and decrypted - ie exactly how the application wants it. Even the data compressed/encrypted by ZFS will be decrypted? If it is true, will it be any CPU overhead? And ZFS send/receive tunneled by ssh becomes the only way to encrypt the data transmission? Even if zfs send/recv will work with encrypted and compressed data you still need some secure tunneling. Storage encryption is not the same as network traffic encryption. Indeed, plus you don't necessarily want to always have your backups encrypted by the same keys as the live data (ie the policy for key management and retention could be different on purpose). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
On 07/26/11 10:14, Andrew Gabriel wrote: Does anyone know if it's OK to do zfs send/receive between zpools with different ashift values? The ZFS Send stream is at the DMU layer at this layer the data is uncompress and decrypted - ie exactly how the application wants it. The ashift is a vdev layer concept - ie below the DMU layer. There is nothing in the send stream format that knows what an ashift actually is. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
On 07/26/11 11:28, Fred Liu wrote: The ZFS Send stream is at the DMU layer at this layer the data is uncompress and decrypted - ie exactly how the application wants it. Even the data compressed/encrypted by ZFS will be decrypted? Yes, which is exactly what I said. All data as seen by the DMU is decrypted and decompressed, the DMU layer is what the ZPL layer is built ontop of so it has to be that way. If it is true, will it be any CPU overhead? There is always some overhead for doing a decryption and decompression, the question is really can you detect it and if you can does it mater. If you are running Solaris on processors with built in support for AES (eg SPARC T2, T3 or Intel with AES-NI) the overhead is reduced significantly in many cases. For many people getting the stuff from disk takes more time than doing the transform to get back your plaintext. In some of the testing I did I found that gzip decompression can be more significant to a workload than doing the AES decryption. So basically yes of course but does it actually mater ? And ZFS send/receive tunneled by ssh becomes the only way to encrypt the data transmission? That isn't the only way. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
On 07/26/11 11:56, Fred Liu wrote: It is up to how big the delta is. It does matter if the data backup can not be finished within the required backup window when people use zfs send/receive to do the mass data backup. The only way you will know of decrypting and decompressing causes a problem in that case is if you try it on your systems. I seriously doubt it will be unless the system is already heavily CPU bound and your backup window is already very tight. BTW adding a sort of off-topic question -- will NDMP protocol in Solaris will do decompression and decryption? Thanks. My understanding of the NDMP protocol is that it would be a translator that did that it isn't part of the core protocol. The way I would do it is to use a T1C tape drive and have it do the compression and encryption of the data. http://www.oracle.com/us/products/servers-storage/storage/tape-storage/t1c-tape-drive-292151.html The alternative is to have the node in your NDMP network that does the writing to the tape to do the compression and encryption of the data stream before putting it on the tape. And ZFS send/receive tunneled by ssh becomes the only way to encrypt the data transmission? That isn't the only way. -- Any alternatives, if you don't mind? ;-) For starters SSL/TLS (which is what the Oracle ZFSSA provides for replication) or IPsec are possibilities as well, depends what the risk is you are trying to protect against and what transport layer is. But basically it is not provided by ZFS itself it is up to the person building the system to secure the transport layer used for ZFS send. It could also be write directly to a T10k encrypting tape drive. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zil on multiple usb keys
On 07/23/11 04:57, Michael DeMan wrote: Generally performance is going to pretty bad as well - USB sticks are not made to be written too rapidly. They are entirely different animals than SSDs. I would not be surprised (but would be curious to know if you still move forward on this) that you will find performance even worse trying to do this. Back in the snv_120 ish era I tried this experiement on both my pool and on a friends. In both cases we were serving NFS (he was also doing CIFS) which was mostly read but also had periods where 1-2 G of data was rapidly added (uploading photos or videos) over the network. In both the USB flash drive and in the case of a San Disk Extreme IV CF card in a CF-IDE enclosure the performance did not improve and in fact in the case of the CF card the enclosure was bugging such that the changes we had to make to the ata config did actually make it slower. I removed the separate log device from both of those pools (by manual hacking with specially build zfs kernel modules because slog removal didn't exist back then.). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is oi_151 zpool version on par with sol11ex?
On 07/19/11 12:03, Jim Klimov wrote: Hello, some time ago I've seen the existence of development ISOs of OpenIndiana dubbed build 151. How close or far is it from the sol11ex 151a? In particular, regarding ZFS/ZPOOL version and functionality? Solaris 11 Express (snv_151a) has the following pool versions beyond 28: 29 RAID-Z/mirror hybrid allocator 30 Encryption 31 Improved 'zfs list' performance http://hub.opensolaris.org/bin/view/Community+Group+zfs/29 http://hub.opensolaris.org/bin/view/Community+Group+zfs/30 http://hub.opensolaris.org/bin/view/Community+Group+zfs/31 -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] latest zpool version in solaris 11 express
On 07/18/11 02:29 PM, Edward Ned Harvey wrote: From: Edward Ned Harvey [mailto:opensolarisisdeadlongliveopensola...@nedharvey.com] It says zpool version 31 and zfs version 5. Can anybody please confirm or deny that this is the absolute latest version available to the public in any way? After applying all updates, it's still zpool 31 and zfs 5. So unless anyone has anything else to suggest... I'm not going to repeat any of the dedup tests. It doesn't look like any zfs/zpool/dedup code has changed since solaris 11 express was released in 2010. Note that in general code can change without either the pool or filesystem versions changing. The filesystem and pool version numbers usually only need to change if there is an on disk format change or some other compatibility issue. Some performance fixes need an on disk layout change and some don't. Note I'm not commenting about any specific issue here but about the way your conclusion was written it doesn't follow that because the pool and version number are the same that no zfs/zpool/dedup code was changed. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] question about COW and snapshots
On 06/15/11 12:29, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Richard Elling That would suck worse. Don't mind Richard. He is of the mind that ZFS is perfect for everything just the way it is, and anybody who wants anything different should adjust their thought process. I suspect rather than that it is more that Richard equated write to write(2) / dmu_write() calls and that would suck performance wise. I also suspect that what Simon wants isn't a snapshot of every little write(2) level call but when the file is completed being updated, maybe on close(2) [ but that assumes the app does actually call close() ]. I know I've certainly had many situations where people wanted to snapshot or rev individual files everytime they're modified. As I said - perfect example is Google Docs. Yes it is useful. But no, it's not what ZFS does. Exactly versions of a whole file, but that is different to a snapshot on every write. How you interpret on every write depends on where in the stack you are coming from. If you think about an application a write is whey you save the document but at the ZPL layer that is multiple write(2) calls and maybe even some rename(2)/unlink(2)/close(2) calls as well. If you move further down then doing a snapshot on every dmu_write() call is fundamentally at odds with how ZFS works. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS receive checksum mismatch
On 06/10/11 12:47, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Jonathan Walker New to ZFS, I made a critical error when migrating data and configuring zpools according to needs - I stored a snapshot stream to a file using zfs send -R [filesystem]@[snapshot][stream_file]. There are precisely two reasons why it's not recommended to store a zfs send datastream for later use. As long as you can acknowledge and accept these limitations, then sure, go right ahead and store it. ;-) A lot of people do, and it's good. Not recommended by who ? Which documentation says this ? As I pointed out last time this came up the NDMP service on Solaris 11 Express and on the Oracle ZFS Storage Appliance uses the 'zfs send' stream as what is to be stored on the tape. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SE11 express Encryption on - errors in the pool after Scrub
On 06/04/11 13:52, Thomas Hobbes wrote: I am testing Solaris Express 11 with napp-it on two machines. In both cases the same problem: Enabling encryption on a folder, filling it with data will result in errors indicated by a subsequent scrub. I did not find the topic on the web, but also not experiences shared by people using encryption on SE11 express. Advice would be highly appreciated. If you are doing the scrub when the encryption keys are not present it is possible you are hitting a known (and very recently fixed in the Solaris 11 development gates) bug. If you have an operating systems support contract with Oracle you should be able to log a support ticket and request a backport of the fix for CR 6989185. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ndmp?
On 05/24/11 14:37, Edward Ned Harvey wrote: When I search around, I see that nexenta has ndmp, and solaris 10 does not, and there was at least some talk about supporting ndmp in opensolaris ... So ... Is ndmp present in solaris 11 express? Is it an installable 3rd party package? How would you go about supporting ndmp if you wanted to? It is present, it is not 3rd party. Click here to install it: http://pkg.oracle.com/solaris/release/p5i/0/service%2Fstorage%2Fndmp.p5i Man pages are here: http://download.oracle.com/docs/cd/E19963-01/html/821-1462/ndmpadm-1m.html http://download.oracle.com/docs/cd/E19963-01/html/821-1462/ndmpd-1m.html http://download.oracle.com/docs/cd/E19963-01/html/821-1462/ndmpstat-1m.html What you mean by supporting it ? I believe (though I haven't tested it) it works with Oracle Secure Backup as well as NetBackup and Networker. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: bug? ZFS crypto vs. scrub
On 11/05/2011 01:07, Daniel Carosone wrote: Sorry for abusing the mailing list, but I don't know how to report bugs anymore and have no visibility of whether this is a known/resolved issue. So, just in case it is not... Log a support call with Oracle if you have a support contract. With Solaris 11 Express, scrubbing a pool with encrypted datasets for which no key is currently loaded, unrecoverable read errors are reported. The error count applies to the pool, and not to any specific device, which is also somewhat at odds with the helpful message text for diagnostic status and suggested action: Known issue: 6989185 scrubbing a pool with encrypted filesystems and snapshots can report false positive errors. If you have a support contract you may be able to request that fix be back ported into an SRU (note I'm not guaranteeing it will be just saying that it is technically possible) When this has happened previously (on this and other pools) mounting the dataset by supplying the key, and rerunning the scrub, removes the errors. For some reason, I can't in this case (keeps complaining that the key is wrong). That may be a different issue that has also happened before, and I will post about separately, once I'm sure I didn't just made a typo (twice) when first setting the key. Since you are saying typo I'm assuming you have keysource=passphrase,prompt (ie the default). Have you ever done a send|recv of the encrypted datasets ? and if so where there multiple snapshots recv'd ? -- Darren J Moffat ___ zfs-crypto-discuss mailing list zfs-crypto-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-crypto-discuss
Re: [zfs-discuss] ls reports incorrect file size
On 05/ 2/11 08:41 PM, Eric D. Mudama wrote: On Mon, May 2 at 14:01, Bob Friesenhahn wrote: On Mon, 2 May 2011, Eric D. Mudama wrote: Hi. While doing a scan of disk usage, I noticed the following oddity. I have a directory of files (named file.dat for this example) that all appear as ~1.5GB when using 'ls -l', but that (correctly) appear as ~250KB files when using 'ls -s' or du commands: These are probably just sparse files. Nothing to be alarmed about. They were created via CIFS. I thought sparse files were an iSCSI concept, no? iSCSI is a block level protocol. Sparse files are a filesystem level concept that is understood my many filesystems including CIFS and ZFS and many others. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: How to mount encrypted file system at boot? Why no pass phrase requesed
On 21/04/2011 11:05, Dr. David Kirkby wrote: I went to a talk last night at the London Open Solaris User Group (LOSUG) by Darren Moffat - an Oracle engineer who had a major role in the ZFS encryption implementation in Solaris. I was particularly interested in this,as for a long time I've been concerned about security of data on my laptop. I decided to try to secure my laptop, which is running Solaris 11 Express. I want to set the machine up so that during the boot process I get asked to enter the pass phrase to mount file system with my home directory on. But I am having problems. First I create the file system. As expected, Solaris asks for a pass phrase: drkirkby@laptop:~# zfs create -o compression=on -o encryption=on -o mountpoint=/export/home/davek rpool/export/home/davek Enter passphrase for 'rpool/export/home/davek': *** Enter again: ** Next I create a file on the file system and check it exists. drkirkby@laptop:~# touch /export/home/davek/foo drkirkby@laptop:~# ls /export/home/davek/foo /export/home/davek/foo Unmount the encrypted file system drkirkby@laptop:~# zfs umount rpool/export/home/davek Check the file I created is no longer available drkirkby@laptop:~# ls /export/home/davek/foo /export/home/davek/foo: No such file or directory Now I get a problem. I was expecting to have to enter the pass phrase again when attempting to mount the file system, but this is not being requested. As you can see, I can mount the file system without the pass phrase and read the data on the file system. I covered that in the talk last night - in fact we had about a 5 minute discussion about why it is this way. If you want the key to go away you need to run: # zfs key -u rpool/export/home/davek drkirkby@laptop:~# zfs mount rpool/export/home/davek drkirkby@laptop:~# ls /export/home/davek/foo /export/home/davek/foo drkirkby@laptop:~# This looks wrong to me, but I've no idea how to solve it. No it is correct by design. As I mentioned last night the reason for this is so that delegated administration of certain properties can work for users that don't have the 'key' delegation and don't have access to the wrapping keys. For example changing a mountpoint causes an umount followed by a mount. There are other changes that under the covers can cause a filesystem to be temporarily unmounted and remounted. The next issue is how do I get the file system to mount when the machine is booted? I want to supply the pass phrase by typing it in, rather than from storing it in USB stick or other similar method. Since this is your user home directory the ideal way would be a PAM module that ran during user login and requested the passphrase for the ZFS encrypted home dir. There isn't one in Solaris 11 Express (snv_151a) at this time. Any ideas what I need to do to get this file system to request the pass phrase before mountin g the file system? There is source for a prototype PAM module in the old opensolaris.org zfs-crypto repository: http://src.opensolaris.org/source/history/zfs-crypto/phase2/usr/src/lib/pam_modules/ You would need to take a clone of that repository and check out changeset 6749:6dded109490e and see if that old PAM module could be hacked into submission. Note that it uses private interfaces and doing so is not supported by any Oracle support contract you have. -- Darren J Moffat ___ zfs-crypto-discuss mailing list zfs-crypto-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-crypto-discuss
Re: [zfs-discuss] X4540 no next-gen product?
On 08/04/2011 14:59, Bob Friesenhahn wrote: On Fri, 8 Apr 2011, Erik Trimble wrote: Sorry, I read the question differently, as in I have X4500/X4540 now, and want more of them, but Oracle doesn't sell them anymore, what can I buy?. The 7000-series (now: Unified Storage) *are* storage appliances. They may be storage appliances, but the user can not put their own software on them. This limits the appliance to only the features that Oracle decides to put on it. Isn't that the very definition of an Appliance ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540 no next-gen product?
On 08/04/2011 17:47, Sašo Kiselkov wrote: In short, I think the X4540 was an elegant and powerful system that definitely had its market, especially in my area of work (digital video processing - heavy on latency, throughput and IOPS - an area, where the 7000-series with its over-the-network access would just be a totally useless brick). As an engineer I'm curious have you actually tried a suitably sized S7000 or are you assuming it won't perform suitably for you ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] disable zfs/zpool destroy for root user
On 17/02/2011 20:44, Stefan Dormayer wrote: is there a way to disable the subcommand destroy of zpool/zfs for the root user? ZFS doesn't actually require root for those it actually checks for individual privileges. Mostly that amounts to sys_mount and sys_config (for pool operations) - though those aren't documented requirements. By default the root user ends up being able to do anything to any pool or dataset and all other users need to be granted access via 'zfs allow'. Would it be useful if you could remove the ability for a root user in a zone to do zfs operations on delegated datasets ? Doing this for the global zone is a little harder but for a local zone it can be done by extending the 'zfs allow' mechanism. See: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=7011365 -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] vscand + quarantine
On 02/ 9/11 09:57 PM, Zoltan Gyula Beck wrote: I would like to ask if it's possible to check the content of quarantine in case of zfs uses vscand + antivirus. So is there any command to list all the infected files in a dataset? Any file which has been quarantined will have the av_quarantine bit set. The easiest way to see that is with /usr/bin/ls for example: ls -/ v foo rw-r--r-- 1 darrenm staff 176411 Nov 4 14:56 foo {archive,nohidden,noreadonly,nosystem,noappendonly,nonodump,noimmutable,av_modified,noav_quarantined,nonounlink,nooffline,nosparse} In the above case the file has noav_quarantined if it had been one that vscand had marked as quarantined it would say av_quarantined instead. There is also a compact mode see ls(1) man page. -rw-r--r-- 1 darrenm staff 176411 Nov 4 14:56 foo {A---q---} That is what it would look like if 'foo' was quarantined. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] vscand + quarantine
On 02/ 9/11 11:50 PM, Zoltan Gyula Beck wrote: Yes, I know that way with ls, but how can I check all the infected files on a dataset which is used by a file server with millions of files?! I mean there is no official way to check infections, but I have to use some customs scripts? (find, ls, grep) The quarantine bit is just an attribute of the file. ZFS is not a database so you can't do select name from files where files.quarantine = true; There is no way go to this other than getting the system attributes from each file directly. The only way to do that from shell script is find/ls/grep. You could write a C program that uses the same method that ls does to get the attributes but you will still have to visit every file in the file system. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best choice - file system for system
On 28/01/2011 13:37, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Tristram Scott When it comes to dumping and restoring filesystems, there is still no official replacement for the ufsdump and ufsrestore. Let's go into that a little bit. If you're piping zfs send directly into zfs receive, then it is an ideal backup method. But not everybody can afford the disk necessary to do that, so people are tempted to zfs send to a file or tape. There are precisely two reasons why that's not officially recommended: Officially yes you have it in quotes but where is the official reference for this ? In fact I'd say the opposite. In Solaris 11 Express the NDMP daemon can backup using dump, tar or zfs send stream. This is also what the 'Sun ZFS Storage Appliance' does see here: http://www.oracle.com/technetwork/articles/systems-hardware-architecture/ndmp-whitepaper-192164.pdf On page 8 of the PDF titled: About ZFS-NDMP Backup Support It does point out though that it is full ZFS dataset only, but incremental backup and incremental restore is supported. This has been tested and is known to work with at least the following backup applications: • Oracle Secure Backup 10.3.0.2 and above • Enterprise Backup Software (EBS) / Legato Networker 7.5 and above • Symantec NetBackup 6.5.3 and above -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On 06/01/2011 23:07, David Magda wrote: On Jan 6, 2011, at 15:57, Nicolas Williams wrote: Fletcher is faster than SHA-256, so I think that must be what you're asking about: can Fletcher+Verification be faster than Sha256+NoVerification? Or do you have some other goal? Would running on recent T-series servers, which have have on-die crypto units, help any in this regard? The on chip SHA-256 implementation is not yet used see: http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via Note that the fix I integrated only uses a software implementation of SHA256 on the T5120 (UltraSPARC T2) and is not (yet) using the on CPU hardware implementation of SHA256. The reason for this is to do with boot time availability of the Solaris Cryptographic Framework and the need to have ZFS as the root filesystem. Not yet changed it turns out to be quite complicated to fix due to very early boot issues. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)
On 07/01/2011 11:56, Sašo Kiselkov wrote: On 01/07/2011 10:26 AM, Darren J Moffat wrote: On 06/01/2011 23:07, David Magda wrote: On Jan 6, 2011, at 15:57, Nicolas Williams wrote: Fletcher is faster than SHA-256, so I think that must be what you're asking about: can Fletcher+Verification be faster than Sha256+NoVerification? Or do you have some other goal? Would running on recent T-series servers, which have have on-die crypto units, help any in this regard? The on chip SHA-256 implementation is not yet used see: http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via Note that the fix I integrated only uses a software implementation of SHA256 on the T5120 (UltraSPARC T2) and is not (yet) using the on CPU hardware implementation of SHA256. The reason for this is to do with boot time availability of the Solaris Cryptographic Framework and the need to have ZFS as the root filesystem. Not yet changed it turns out to be quite complicated to fix due to very early boot issues. Would it be difficult to implement both methods and allow ZFS to switch to the hardware-accelerated crypto backend at runtime after it has been brought up and initialized? It seems like one heck of a feature Wither it is difficult or not depends on your level of familiarity with ZFS, boot and the cryptographic framework ;-) For me no it wouldn't be difficult but it still isn't completely trivial. (essentially removing most of the computational complexity of dedup). Most of the data I've seen on the performance impact of dedup is not coming from the SHA256 computation it is mostly about the additional IO to deal with the DDT. Though lowering the overhead that SHA256 does add is always a good thing. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A few questions
On 06/01/2011 00:14, Edward Ned Harvey wrote: solaris engineers don't use? Non-sun hardware. Pretty safe bet you won't find any Dell servers in the server room where solaris developers do their thing. You would lose that bet, not only would you find Dell you would many other big names as well as white box hand build systems too. Solaris developers use a lot of different hardware - Sun never made laptops so many of us have Apple (running Solaris on the metal and/or under virtualisation) or Toshiba or Fujitsu etc laptops. There are also many workstations around the company that aren't Sun hardware as well as servers. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] stupid ZFS question - floating point operations
On 22/12/2010 20:27, Garrett D'Amore wrote: That said, some operations -- and cryptographic ones in particular -- may use floating point registers and operations because for some architectures (sun4u rings a bell) this can make certain expensive Well remembered! There are sun4u optimisations that use the floating point unit but those only apply to the bignum code which in kernel is only used by RSA. operations go faster. I don't think this is the case for secure hash/message digest algorithms, but if you use ZFS encryption as found in Solaris 11 Express you might find that on certain systems these registers are used for performance reasons, either on the bulk crypto or on the keying operations. (More likely the latter, but my memory of these optimizations is still hazy.) RSA isn't used at all by ZFS encryption, everything is AES (including key wrapping) and SHA256. So those optimistations for floating point don't come into play for ZFS encryption. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] stupid ZFS question - floating point operations
On 23/12/2010 15:18, Garrett D'Amore wrote: Thanks for the clarification. I guess I need to go back and figure out how ZFS crypto keying is performed. I guess most likely the key is generated from some sort of one-way hash from a passphrase? See http://blogs.sun.com/darren/entry/zfs_encryption_what_is_on where I explain all the type of keys used and how they are generated as well as how passphrases are turned into AES wrapping keys (using PKCS#5 PBE). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] stupid ZFS question - floating point operations
On 23/12/2010 17:09, joerg.schill...@fokus.fraunhofer.de wrote: Darren J Moffatdarren.mof...@oracle.com wrote: On 22/12/2010 20:27, Garrett D'Amore wrote: That said, some operations -- and cryptographic ones in particular -- may use floating point registers and operations because for some architectures (sun4u rings a bell) this can make certain expensive Well remembered! There are sun4u optimisations that use the floating point unit but those only apply to the bignum code which in kernel is only used by RSA. It may be a guess caused by the fact that integer division and multiplication is inside the FPU on SPARC processors. Not a guess it is code to do big number integer arithmetic that is optimised for sun4u to explicitly (ab)using the FPU. This isn't guessing it is was a deliberate design choice. Specifically this code here: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/common/bignum/sun4u/ Note that there are separate kernel and user land variants of that. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] a single nfs file system shared out twice with different permissions
On 20/12/2010 19:26, Geoff Nordli wrote: I guess he has some application he can imprison into a specific read-only subdirectory, while some other application should be able to read/write or something like that, using the same username, on the same machine. It is the same application, but for some functions it needs to use read-only access or it will modify the files when I don't want it to. An other alterntative is if the application is running on Solaris then you can run it with the basic file_write privilege removed. This basic privilege was added for exactly this type of use case. $ ppriv -e -s EPIL=basic,!file_write myapp If it is being started by an SMF service you can remove file_write in the method_credential section - see smf_method(5). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] relationship between ARC and page cache
On 21/12/2010 14:25, Phil Harman wrote: Hi Jeff, ZFS support for mmap() was something of an afterthought. The current Solaris virtual memory infrastructure didn't have the features or performance required, which is why ZFS ended up with the ARC. Yes, you've got it. When we mmap() a ZFS file, there are two main caches involved: the ZFS ARC and the good old Solaris page cache. The reason for poor performance is the overhead of keeping the two caches in sync, but contention for RAM is also an issue. Clamping the ARC is probably a good thing in your case, but it only addresses part of the problem. Another alternative to try would be setting primarycache=metadata on the ZFS dataset that contains the mmap files. That way you are only turning of the ZFS ARC cache of the file content for that one dataset rather than clamping the ARC. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] a single nfs file system shared out twice with different permissions
On 18/12/2010 07:09, Geoff Nordli wrote: I am trying to configure a system where I have two different NFS shares which point to the same directory. The idea is if you come in via one path, you will have read-only access and can't delete any files, if you come in the 2nd path, then you will have read/write access. That sounds very similar to what you would do with Trusted Extensions. The read/write label would be a higher classification than the read-only one - since you can read down, can't see higher and need to be equal to modify. For more information on Trusted Extensions start with these resources: Oracle Solaris 11 Express Trusted Extensions Collection http://docs.sun.com/app/docs/coll/2580.1?l=en OpenSolaris Security Community pages on TX: http://hub.opensolaris.org/bin/view/Community+Group+security/tx -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ... open source moving forward?
On 12/13/10 05:55 PM, Miles Nordin wrote: + Oracle publishes the promised yet-to-be-delivered zfs-crypto paper that's thorough enough to write a compatible implementation It isn't yet the full paper but a lot of the on disk details are in my latest blog entry and all of the structs necessary for the on disk format are in the CTF data of the binaries. http://blogs.sun.com/darren/entry/zfs_encryption_what_is_on -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 17/11/2010 21:58, Bill Sommerfeld wrote: In particular, the mechanism by which dedup-friendly block IV's are chosen based on the plaintext needs public scrutiny. Knowing Darren, it's very likely that he got it right, but in crypto, all the details matter and if a spec detailed enough to allow for interoperability isn't available, it's safest to assume that some of the details are wrong. That is described here: http://blogs.sun.com/darren/entry/zfs_encryption_what_is_on If dedup=on for the dataset the per block IVs are generated differently. They are generated by taking an HMAC-SHA256 of the plaintext and using the left most 96 bits of that as the IV. The key used for the HMAC-SHA256 is different to the one used by AES for the data encryption, but is stored (wrapped) in the same keychain entry, just like the data encryption key a new one is generated when doing a 'zfs key -K dataset'. Obviously we couldn't calculate this IV when doing a read so it has to be stored. This was also suggested independently by other well known people involved in encrypted filesystems while it was discussed on a public forum (most of that thread was cross posted to zfs-crypto-discuss). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS snapshot limit?
On 01/12/2010 13:36, f...@ll wrote: I must send zfs snaphost from one server to another. Snapshot have size 130GB. Now I have question, the zfs have any limit of sending file? No. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on a removable device
On 26/11/2010 13:16, Pavel Heimlich wrote: I tried to transfer some data between two S11 machines via a usb harddrive with zfs on it, but importing the zpool failed (with some assertion error I did not write down) because I did not export it first (on the first machine). I had to go back to the first machine, plug the drive in again and export the fs. Are there some zfs / OS parameters I could set so that my usb drive with zfs on it would meet the expectations one has from a removable drive? (i.e. safe to remove +-anytime) No you run zpool export first, that is the OS parameter, this is no different to any other filesystem on any other operating system. If you don't export it first how is Solaris or ZFS supposed to know the difference between you yanking it out because you are purposely moving it and the drive accidentally falling out or some other error that causes it to be come unavailable. Hint: the answer is you can't unless you administratively tell ZFS that the pool is supposed to be going away they way you do that is by 'zpool export'. Unlike other filesystems though ZFS will be consistent on disk. You didn't have to plug it back into to the original system you could have just forced the import. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 23/11/2010 21:01, StorageConcepts wrote: r...@solaris11:~# zfs list mypool/secret_received cannot open 'mypool/secret_received': dataset does not exist r...@solaris11:~# zfs send mypool/plaint...@test | zfs receive -o encryption=on mypool/secret_received cannot receive: cannot override received encryption --- Is there a implementation/technical reason for not allowing this ? Yes there is, this is because of how the ZPL metadata is written to disk - it is slightly different between encrypted and non encrypted cases and unfortunately that difference shows up even in the ZFS send stream. It is a known (and documented in the Admin guide) restriction. If we allowed the receive to proceed the result would be that some ZPL metadata (including filenames) for some files may end up on disk in the clear, there are various cases where this could happen but it is most likely to happen when the filesystem is being used by Windows clients because of the combination of things that happen - but it can equally well happen with only local ZPL usage too particularly if there are large ACLs in use. In the mean time the best workaround I can offer is to use tar/cpio/rsync, but obviously you lose your snapshot history that way. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 19/11/2010 00:39, David Magda wrote: On Nov 16, 2010, at 05:09, Darren J Moffat wrote: Both CCM[1] and GCM[2] are provided so that if one turns out to have flaws hopefully the other will still be available for use safely even though they are roughly similar styles of modes. On systems without hardware/cpu support for Galios multiplication (Intel Westmere and later and SPARC T3 and later) GCM will be slower because the Galios field multiplication has to happen in software without any hardware/cpu assist. However depending on your workload you might not even notice the difference. Both modes of operation are authenticating. At one point the design of ZFS crypto had the checksum automatically go to SHA-256 when it was enabled. [1] Is SHA activation still the case, or are the two modes of operations simply used in themselves to verify data integrity? That is still the case, the blockpointer contains the IV, the SHA256 checksum (truncated) and the MAC from CCM and GCM. Also, are slog and cache devices encrypted at this time? Given a pool, and the fact that only particular data sets on it could be encrypted, would these special devices be entirely encrypted, or only data from the particular encrypted data set/s? I would also assume the in-memory ARC would be clear-text. The ZIL wither it is in pool or on a slog is always encrypted for an encrypted dataset, it is encrypted in exactly the same way. Data from encrypted datasets does not currently go to the L2ARC cache devices. The in memory ARC is in the clear and it has to be because those buffers can be shared via zero copy means to other parts of the system including other filesystems like NFS and CIFS. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
The design for ZFS crypto was done in the open via opensolaris.org and versions of the source (though not the final version at this time) are available on opensolaris.org. It was reviewed by internal and external to Sun/Oracle people who have considerable crypto experience. Important parts of the cryptography design were also discussed on other archived public forums as well as zfs-crypto-discuss. The design was also presented at IEEE 1619 SISWG and at SNIA. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 17/11/2010 20:04, Miles Nordin wrote: djm == Darren J Moffatdarr...@opensolaris.org writes: djm http://blogs.sun.com/darren/entry/introducing_zfs_crypto_in_oracle djm http://blogs.sun.com/darren/entry/assued_delete_with_zfs_dataset djm http://blogs.sun.com/darren/entry/compress_encrypt_checksum_deduplicate_with Is there a URL describing the on-disk format and implementation details? It is a work in progress. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 18/11/2010 03:55, grarpamp wrote: One reason you may want to select aes-128-gcm rather than aes-128-ccm is that GCM is one of the modes for AES in NSA Suite B[3], but CCM is not. Are there symmetric algorithms other than AES that are of interest ? How might AES-XTS [1] be able to fit into the the ZFS picture? It doesn't. We don't need it because we don't need to have the ciphertext the same size as the plaintext because we have space to store a sufficiently large MAC (and store an IV as well). This is why CCM and GCM were chosen rather than XTS or EME2. Additionally given the user may wish to trade off compression, dedup, the number of encryptable blocks [2], etc for any particular selectable algorithm. We don't need to make those compromises in ZFS, you can compress and encrypt and dedup (it happens in that order). http://blogs.sun.com/darren/entry/compress_encrypt_checksum_deduplicate_with For changing the encryption key see the discussion of 'zfs key -K' in the zfs(1M) man page: http://docs.sun.com/app/docs/doc/821-1462/zfs-1m?l=ena=view -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 17/11/2010 10:17, Richard Elling wrote: I know there are far more apps without support for encryption than with it. And given the ever more stringent government regulations in the US, there are plenty of customers chomping at the bit for encryption at the storage array. I do not disagree. There are many products in the market that seamlessly encrypt data. But, vi has had encryption for almost 30 years, so there is clearly no barrier to app writers. As more development moves to the cloud, encryption comes almost free at the app layer. The only thing left is the legacy apps... Encryption at the application layer solves a different set of problems to encryption at the storage layer. Just like the encryption in ZFS solves a different set of problems to full disk encryption in the drive firmware. These sets have overlapping regions and depending on security policies one or more may be the best solution. As always encryption is the easy part it is key management that is hard, because key management enters the real of policy and key management can be hard to scale out to large numbers of apps. There is on one correct solution for where to do encryption just like there is on one correct way to write files onto persistent media. Choice is important and sometimes choosing more than one is the correct thing to do. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 17/11/2010 11:41, Erik Trimble wrote: There is on one correct solution for where to do encryption just like there is on one correct way to write files onto persistent media. Choice is important and sometimes choosing more than one is the correct thing to do. I'm assuming you meant no the two times you wrote on in that second-to-last sentence. :-) Yes thanks, it should have read: There is no one correct solution for where to do encryption just like there is no one correct way to write files onto persistent media. Choice is important and sometimes choosing more than one is the correct thing to do. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 17/11/2010 14:18, Bob Friesenhahn wrote: On Wed, 17 Nov 2010, Markus Kovero wrote: Does Oracle support Solaris 11 Express in production systems? -- richard Yes, You need Premier support plan from Oracle for that. Afaik, sol11 express is production ready, and is going to be updated to real Solaris 11, and is supported even with non-oracle hardware if you have the money (and certified system). Solaris 11 Express may be production ready but is Oracle Premier Support prepared to support it in production? That seems like the vital question to me. As for myself, I will wait a while and observe before assigning my trust. From the FAQ[1] linked from here: http://www.oracle.com/technetwork/server-storage/solaris11/overview/index.html Licensing and Support for Oracle Solaris 11 Express 11-Can I get support for Oracle Solaris 11 Express? Yes. Oracle Solaris 11 Express is covered under the Oracle Premier Support for Operating Systems or Oracle Premier Support for Systems support option for Oracle hardware, and Oracle Solaris Premier Subscription for non-Oracle hardware. Customers must choose either of these support options should they wish to deploy Oracle Solaris 11 Express into a production environment. [1] http://www.oracle.com/technetwork/server-storage/solaris11/overview/faqs-oraclesolaris11express-185609.pdf -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 11/15/10 19:36, David Magda wrote: On Mon, November 15, 2010 14:14, Darren J Moffat wrote: Today Oracle Solaris 11 Express was released and is available for download[1], this release includes on disk encryption support for ZFS. Using ZFS encryption support can be as easy as this: # zfs create -o encryption=on tank/darren Enter passphrase for 'tank/darren': Enter again: Looking forwarding to playing with it. Some questions: 1. Is it possible to do a 'zfs create -o encryption=off tank/darren/music' after the above command? I don't much care if my MP3s are encrypted. :) No, all child filesystems must be encrypted as well. This is to avoid problems with mounting during boot / pool import. It is possible this could be relaxed in the future but it is highly dependent on some other things that may not work out. 2. Both CCM and GCM modes of operation are supported: can you recommended which mode should be used when? I'm guessing it's best to accept the default if you're not sure, but what if we want to expand our knowledge? You've preempted my next planned posting ;-) But I'll attempt to give an answer here: 'on' maps to aes-128-ccm, because it is the fastest of the 6 available modes of encryption currently provided. Also I believe it is the current wisdom of cryptographers (which I do not claim to be) that AES 128 is the preferred key length due to recent discoveries about AES 256 that are not know to impact AES 128. Both CCM[1] and GCM[2] are provided so that if one turns out to have flaws hopefully the other will still be available for use safely even though they are roughly similar styles of modes. On systems without hardware/cpu support for Galios multiplication (Intel Westmere and later and SPARC T3 and later) GCM will be slower because the Galios field multiplication has to happen in software without any hardware/cpu assist. However depending on your workload you might not even notice the difference. One reason you may want to select aes-128-gcm rather than aes-128-ccm is that GCM is one of the modes for AES in NSA Suite B[3], but CCM is not. Are there symmetric algorithms other than AES that are of interest ? The wrapping key algorithm currently matches the data encryption key algorithm, is there interest in providing different wrapping key algorithms and configuration properties for selecting which one ? For example doing key wrapping with an RSA keypair/certificate ? [1] http://en.wikipedia.org/wiki/CCM_mode [2] http://en.wikipedia.org/wiki/Galois/Counter_Mode [3] http://en.wikipedia.org/wiki/NSA_Suite_B_Cryptography -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
Today Oracle Solaris 11 Express was released and is available for download[1], this release includes on disk encryption support for ZFS. Using ZFS encryption support can be as easy as this: # zfs create -o encryption=on tank/darren Enter passphrase for 'tank/darren': Enter again: # Continued at: http://blogs.sun.com/darren/entry/introducing_zfs_crypto_in_oracle http://blogs.sun.com/darren/entry/assued_delete_with_zfs_dataset http://blogs.sun.com/darren/entry/compress_encrypt_checksum_deduplicate_with [1] http://www.oracle.com/technetwork/server-storage/solaris11/downloads/index.html -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to quiesce and unquiesc zfs and zpool for array/hardware snapshots ?
On 12/11/2010 13:01, sridhar surampudi wrote: How I can I quiesce / freeze all writes to zfs and zpool if want to take hardware level snapshots or array snapshot of all devices under a pool ? are there any commands or ioctls or apis available ? zpool export pool zpool import pool That is the only documented and supported way to do it that I'm aware of, and yes that does take the pool off line but that way you can be sure it isn't changing. The only other way I know of to freeze a pool is for testing purposes only and if you want to learn about that you need to read the code because I'm not going to disclose it here in case it is miss used. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool split how it works?
On 10/11/2010 11:18, sridhar surampudi wrote: I was wondering how zpool split works or implemented. If a pool pool1 is on a mirror having two devices dev1 and dev2 then using zpool split I can split with the new pool name say pool-mirror on dev2. How split can change metadata on dev2 and rename/replace and associate with new name i.e. pool-mirror ?? Exactly what isn't clear from the description in the man page ? zpool split [-R altroot] [-n] [-o mntopts] [-o property=value] pool newpool [device ...] Splits off one disk from each mirrored top-level vdev in a pool and creates a new pool from the split-off disks. The original pool must be made up of one or more mirrors and must not be in the process of resilvering. The split subcommand chooses the last device in each mirror vdev unless overridden by a device specification on the com- mand line. When using a device argument, split includes the speci- fied device(s) in a new pool and, should any devices remain unspecified, assigns the last device in each mir- ror vdev to that pool, as it does normally. If you are uncertain about the outcome of a split command, use the -n (dry-run) option to ensure your command will have the effect you intend. Or are you really asking about the implementation details ? If you want to know how it is implemented then you need to read the source code. Here would be a good starting point: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libzfs/common/libzfs_pool.c#zpool_vdev_split Which ends up in kernel here: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zfs_ioctl.c#zfs_ioc_vdev_split -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does a zvol use the zil?
Yes, ZVOLs do use the ZIL. If the write cache has been disabled on the zvol by the DKIOCSETWCE ioctl or the sync property is set to always. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does a zvol use the zil?
On 21/10/2010 18:59, Maurice Volaski wrote: Does the write cache referred to above refer to the Writeback Cache property listed by stmfadm list-lu -v (when a zvol is a target) or is that some other cache and if it is, how does it interact with the first one? Yes it does, that basically results in DKIOCGETWCE ioctl being called on the ZVOL (though you won't see that in truss because it is called from the comstar kernel modules not directly from stmfadm in userland). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
On 20/10/2010 12:20, Edward Ned Harvey wrote: It's one of the big selling points, reasons for ZFS to exist. You should always give ZFS JBOD devices to work on, so ZFS is able to scrub both of the redundant sides of the data, and when a checksum error occurs, ZFS is able to detect *and* correct it. Don't use hardware raid. That isn't the recommended best practice, you are stating it far too strongly. The recommended best practice is to always create ZFS pools with redundancy in the control of ZFS. That doesn't require that the back end storage be JBOD or full disks nor does it require you not to use hardware raid. Some of all of which are impossible if you are using SAN or other remote block storage devices in many cases - and certainly the case if the SAN is provided by a Sun ZFS Storage appliance. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Myth? 21 disk raidz3: Don't put more than ___ disks in a vdev
On 20/10/2010 14:03, Edward Ned Harvey wrote: In a discussion a few weeks back, it was mentioned that the Best Practices Guide says something like Don't put more than ___ disks into a single vdev. At first, I challenged this idea, because I see no reason why a 21-disk raidz3 would be bad. It seems like a good thing. If you have those 21 disks spread across 3 top level vdevs each of raidz3 with 7 disks then ZFS can will stripe across 3 vdevs rather than than 1. Here is an example from the Sun ZFS Storage Appliance GUI: Each O is a score out of 5 -- AVAIL PERFCAPACITY Double parity RAID _ OOO__ _ 1.45T Mirrored_ O O 808G Single partiy RAID, narrow stripes OOO__ _ OO___ 1.18T Striped _ O O 1.84T Triple mirrored _ O _ 538G Triple parity RAID, wide stripes_ OO___ O 1.31T -- -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to avoid striping ?
On 18/10/2010 07:44, Habony, Zsolt wrote: I have seen a similar question on this list in the archive but haven’t seen the answer. Can I avoid striping across top level vdevs ? If I use a zpool which is one LUN from the SAN, and when it becomes full I add a new LUN to it. But I cannot guarantee that the LUN will not come from the same spindles on the SAN. That sounds like a problem with your SAN config if that matters to you. Can I force zpool to not to stripe the data ? You can't, but why do you care ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss