[zfs-discuss] impressive
boldly plowing forwards I request a few disks/vdevs to be mirrored all at the same time : bash-3.2# zpool status zfs0 pool: zfs0 state: ONLINE scrub: resilver completed with 0 errors on Thu Feb 1 04:17:58 2007 config: NAME STATE READ WRITE CKSUM zfs0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c0t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c0t10d0 ONLINE 0 0 0 c1t11d0ONLINE 0 0 0 c1t12d0ONLINE 0 0 0 c1t13d0ONLINE 0 0 0 c1t14d0ONLINE 0 0 0 errors: No known data errors bash-3.2# zpool attach -f zfs0 c1t11d0 c0t11d0 bash-3.2# zpool attach -f zfs0 c1t12d0 c0t12d0 bash-3.2# zpool attach -f zfs0 c1t13d0 c0t13d0 bash-3.2# zpool attach -f zfs0 c1t14d0 c0t14d0 needless to say there is some thrashing going on bash-3.2# zpool status zfs0 pool: zfs0 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.00% done, 45h14m to go config: NAME STATE READ WRITE CKSUM zfs0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c0t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c0t10d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t11d0 ONLINE 0 0 0 c0t11d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t12d0 ONLINE 0 0 0 c0t12d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t13d0 ONLINE 0 0 0 c0t13d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t14d0 ONLINE 0 0 0 c0t14d0 ONLINE 0 0 0 errors: No known data errors bash-3.2# moments later I see : bash-3.2# zpool status zfs0 pool: zfs0 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 1.59% done, 2h19m to go config: NAME STATE READ WRITE CKSUM zfs0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t9d0 ONLINE 0 0 0 c0t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c0t10d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t11d0 ONLINE 0 0 0 c0t11d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t12d0 ONLINE 0 0 0 c0t12d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t13d0 ONLINE 0 0 0 c0t13d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t14d0 ONLINE 0 0 0 c0t14d0 ONLINE 0 0 0 errors: No known data errors bash-3.2# bash-3.2# mdb -k Loading modules: [ unix krtld genunix specfs dtrace ufs scsi_vhci sd ip hook neti sctp arp usba nca zfs random audiosup sppp crypto ptm md logindmux cpc wrsmd fcip fctl fcp nfs ] ::memstat Page SummaryPagesMB %Tot Kernel 79986 624 71% Anon16131 126 14% Exec and libs1830142% Page cache533 40% Free (cachelist) 934 71% Free (freelist) 13662 106 12% Total 113076 883 Physical 111514 871 bash-3.2# so in a few hours I will have decent redundency all on snv_55b ... looking very very fine -- Dennis ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS inode equivalent
Neil Perrin wrote: No it's not the final version or even the latest! The current on disk format version is 3. However, it hasn't diverged much and the znode/acl stuff hasn't changed. and it will get updated as part of zfs-crypto, I just haven't done so yet because I'm not finished designing yet. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS checksums - block or file level
I am trying to understand if zfs checksums apply at a file or a block level. We know that zfs provides end to end checksum integrity, and I assumed that when I write a file to a zfs filesystem, the checksum was calculated at a file level, as opposed to say, a block level. However, I have noticed that when I create an emulated volume, that volume has a checksum property, set to the same default as a normal zfs filesystem. I can even change the checksum value as normal, see below: # /usr/sbin/zfs create -V 50GB -b 128KB mypool/myvol # /usr/sbin/zfs set checksum=sha256 mypool/myvol Now on this emulated volume, I could place any number of structures that are not zfs filesystems, say raw database volumes, or ufs, qfs, etc. Since these do not perform end to end checksums, can someone explain to me what the zfs checksum would be doing at this point? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: UFS on zvol: volblocksize and maxcontig
I hope there will be consideration given to providing compatibility with UFS quotas (except that inode limits would be ignored). At least to the point of having edquota(1m) quot(1m) quota(1m) quotactl(7i) repquota(1m) rquotad(1m) and possibly quotactl(7i) work with zfs (with the exception previously mentioned). OTOH, quotaon(1m)/quotaoff(1m)/quotacheck(1m) may not be needed for support of per-user quotas in zfs (since it will presumably have its own ways of enabling these, and will simply never mess up?) None of which need preclude new interfaces with greater functionality (like both user and group quotas), but where there is similar functionality, IMO it would be easier for a lot of folks if quota maintenance (esp. edquota and reporting) could be done the same way for ufs and zfs. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS checksums - block or file level
On 2/1/07, Nathan Essex [EMAIL PROTECTED] wrote: I am trying to understand if zfs checksums apply at a file or a block level. We know that zfs provides end to end checksum integrity, and I assumed that when I write a file to a zfs filesystem, the checksum was calculated at a file level, as opposed to say, a block level. ZFS checksums are done at the block level. End to end checksum integrity means that when the actual data reaches the application from the platter, we can guarantee to a very high certainty that the data is uncorrupted. Either a block level checksum or a file level checksum will suffice. -- Regards, Jeremy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS checksums - block or file level
ZFS checksums are at the block level. Nathan Essex wrote On 02/01/07 08:27,: I am trying to understand if zfs checksums apply at a file or a block level. We know that zfs provides end to end checksum integrity, and I assumed that when I write a file to a zfs filesystem, the checksum was calculated at a file level, as opposed to say, a block level. However, I have noticed that when I create an emulated volume, that volume has a checksum property, set to the same default as a normal zfs filesystem. I can even change the checksum value as normal, see below: # /usr/sbin/zfs create -V 50GB -b 128KB mypool/myvol # /usr/sbin/zfs set checksum=sha256 mypool/myvol Now on this emulated volume, I could place any number of structures that are not zfs filesystems, say raw database volumes, or ufs, qfs, etc. Since these do not perform end to end checksums, can someone explain to me what the zfs checksum would be doing at this point? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS checksums - block or file level
Nathan Essex wrote: Thank You, so that means that even if I use something that writes raw i/o to a zfs emulated volume, I still get the checksum protection, and hence data corruption protection. yes it does. Also consider how BAD performance could be if it were actually calculated on a per file basis. For example a 1 bit write on to the end of a 5G file would require reading and checksuming 5G of data to calculate the new checksum. Block level is the only sensible way to do this IMO. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS checksums - block or file level
Neil Perrin wrote: ZFS checksums are at the block level. This has been causing some confusion lately, so perhaps we could say: ZFS checksums are at the file system block level, not to be confused with the disk block level or transport block level. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
[EMAIL PROTECTED] said: That is the part of your setup that puzzled me. You took the same 7 disk raid5 set and split them into 9 LUNS. The Hitachi likely splits the virtual disk into 9 continuous partitions so each LUN maps back to different parts of the 7 disks. I speculate that ZFS thinks it is talking to 9 different disks so spreads out the writes accordingly. What ZFS thinks is sequential writes becomes well spaced writes across the entire disk blows your seek time off the roof. That's what I thought might happen before I even tried this, although it's also possible the Hitachi stripes each LUN across all 7 disks. Either way, one could be getting too many seeks. Note that I'm just trying to see if it was so bad that the self-healing capability wasn't worth the cost. I do realize these are 7200rpm SATA disks, so seeking isn't what they do best. I'm interested how it looks like from the Hitachi end. If you can, repeat the test with the Hitachi presenting all 7 disks directly to ZFS as LUNs? The array doesn't give us that capability. Interesting... what you are suggesting is that %b is 100% when w/s and r/s is 0? Correct. Sometimes all iostat -xn columns are 0 except %b; Sometimes the asvc_t column stays at 4.0 for the duration of the quiet period. I've also observed times where all columns were 0, including %b. Sure is puzzling. [EMAIL PROTECTED] said: IIRC, the calculation for %busy is the amount of time that an I/O is on the device. These symptoms would occur if an I/O is dropped somewhere along the way or at the array. Eventually, we'll timeout and retry, though by default that should be after 60 seconds. I think we need to figure out what is going on here before accepting the results. It could be that we're overrunning the queue on the Hitachi. By default, ZFS will send 35 concurrent commands per vdev and the ssd driver will send up to 256 to a target. IIRC, Hitachi has a formula for calculating sdd_max_throttle to avoid such overruns, but I'm not sure if that applies to this specific array. Hmm, it's true that I have made no tuning changes on the T2000 side. It would make sense if the array just stopped responding. I'll have to poke at the array and see if it has any diagnostics logged somewhere. I recall that the Hitachi docs do have some recommendations on max-throttle settings, so I'll go dig those up and see what I can find out. Thanks for the comments, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hot spares - in standby?
On Wed, 31 Jan 2007 [EMAIL PROTECTED] wrote: I understand all the math involved with RAID 5/6 and failure rates, but its wise to remember that even if the probabilities are small they aren't zero. :) Agreed. Another thing I've seen, is that if you have an A/C (Air Conditioning) event in the data center or lab, you will usually see a cluster of failures over the next 2 to 3 weeks. Effectively, all your disk drives have been thermally stressed and are likely to exhibit a spike in the failure rates in the near term. Often, in a larger environment, the facilities personnel don't understand the co-relation between an A/C event and disk drive failure rates. And major A/C upgrade work is often scheduled over a (long) weekend when most of the technical talent won't be present. After the work is completed everyone is told that it went very well because the organization does not do bad news and then you loose two drives in a RAID5 array And after 3-5 years of continuous operation, you better decommission the whole thing or you will have many disk failures. Agreed. We took an 11 disk FC hardware RAID box offline recently because all the drives were 5 years old. It's tough to hit those power off switches and scrap working disk drives, but much better than the business disruption and professional embarassment caused by data loss. And much better to be in control of, and experience, *scheduled* downtime. BTW: don't forget that if you plan to continue to use the disk enclosure hardware you need to replace _all_ the fans first. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: What SATA controllers are people using for ZFS?
On Thu, 1 Feb 2007, Tom Buskey wrote: [i] I got an Addonics eSata card. Sata 3.0. PCI *or* PCI-X. Works right off the bat w/ 10u3. No firmware update needed. It was $130. But I don't pull out my hair and I can use it if I upgrade my server for pci-x [/i] And I'm finding the throughput isn't there. 2MB/s in ZFS RAIDZ and worse with UFS. *sigh* I think that there are big issues with the 3124 driver. I saw unexplained pauses that lasted from 30 to 80+ Seconds during a tar from a single SATA disk drive that I was migrating data from (using a Syba SD-SATA2-2E2I card). I fully expected the kernel to crash while observing this transfer (it did'nt). It happened periodically - each time a certain amount of data had been transferred (just by observation - not measurement). And this was a UFS filesystem and the drive is a Sun original drive from an Ultra 20 box. I need to do some followup experiments as Mike Riley (Sun) has kindly offered to take my results to the people working on this driver. So, anyone know an inexpensive 4 port SATA card for PCI that'll work with 10u3 and I don't need to reflash the BIOS on? (I bricked a Syba...) Honestly, you're much better off with the $125 8-port SuperMicro board that I have been unable to break to date. Details: SuperMicro AOC-SAT2-MV8 8-port - uses the Rev C0 (Hercules-2) chip: http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm Kudos to the Sun developers working the Marvell driver! :) In the meantime I hope to find time to test a SAS2041E-R (initially the PCI Express version of this card). Keep posting to zfs-discuss! :) Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FYI: ZFS on USB sticks (from Germany)
On Feb 1, 2007, at 10:51 AM, Richard Elling wrote: FYI, here is an interesting blog on using ZFS with a dozen USB drives from Constantin. http://blogs.sun.com/solarium/entry/solaris_zfs_auf_12_usb My German is somewhat rusty, but I see that Google Translate does a respectable job. Thanks Constantin! -- richard This is the best line: Hier ist die offizielle Dokumentation, echte Systemhelden jedoch kommen mit nur zwei man-Pages aus: zpool und zfs. Roughly, Here [link] is the official documentation; real system heroes need only the two manpages: zpool and zfs Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS limits on zpool snapshots
The ZFS On-Disk specification and other ZFS documentation describe the labeling scheme used for the vdevs that comprise a ZFS pool. A label entry contains, among other things, an array of uberblocks, one of which will point to the active object set of the pool it is a part of at a given instant (according to documentation, the active uberblock for a given pool could be located in the uberblock array of any vdev participating in the pool at a given instant, and is subject to relocation from vdev to vdev as the uberblock for the pool is recreated in an update). Recreation of the active uberblock would occur, for example, if we took a snapshot of the pool and changes were then made anywhere in the pool. Since a new uberblock is required in this snapshot scenario, and since it appears that the uberblocks are treated as a kind of circular list across vdevs, it seems to me that the number of available snapshots we could have of a pool at any given instant would be strictly limited to the number of available uberblocks in the vdevs of the pool (128 uberblocks per vdev, if I have that straight). Is this truly the case or am I missing something here ? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS limits on zpool snapshots
[EMAIL PROTECTED] wrote on 02/01/2007 01:17:15 PM: The ZFS On-Disk specification and other ZFS documentation describe the labeling scheme used for the vdevs that comprise a ZFS pool. A label entry contains, among other things, an array of uberblocks, one of which will point to the active object set of the pool it is a part of at a given instant (according to documentation, the active uberblock for a given pool could be located in the uberblock array of any vdev participating in the pool at a given instant, and is subject to relocation from vdev to vdev as the uberblock for the pool is recreated in an update). Recreation of the active uberblock would occur, for example, if we took a snapshot of the pool and changes were then made anywhere in the pool. Since a new uberblock is required in this snapshot scenario, and since it appears that the uberblocks are treated as a kind of circular list across vdevs, it seems to me that the number of available snapshots we could have of a pool at any given instant would be strictly limited to the number of available uberblocks in the vdevs of the pool (128 uberblocks per vdev, if I have that straight). Is this truly the case or am I missing something here ? It is my understanding that during the snapshot uberblock pool change creation chain the old uberblock is treated as a separate entity and not tied to the new uberblock list. I am sure I will be corrected if I am reading the flow wrong. Thanks, -Wade ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS limits on zpool snapshots
Recreation of the active uberblock would occur, for example, if we took a snapshot of the pool and changes were then made anywhere in the pool. The uberblock is updated quite often, not just on snapshots. Since a new uberblock is required in this snapshot scenario, and since it appears that the uberblocks are treated as a kind of circular list across vdevs, it seems to me that the number of available snapshots we could have of a pool at any given instant would be strictly limited to the number of available uberblocks in the vdevs of the pool (128 uberblocks per vdev, if I have that straight). Is this truly the case or am I missing something here ? Are you talking about normal ZFS filesystem snapshots or something else? The new uberblock will point to all filesystem snapshots. The old copies would never normally be referenced. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and thin provisioning
I found this article (http://www.cuddletech.com/blog/pivot/entry.php?id=729) but I have 2 questions. I am trying the steps on Opensolaris build 54. Since you create the filesystem with newfs, isn't that really a ufs filesystem running on top of zfs? Also I haven't been able to do anything in the normal fashion (ie zfs and zpool commands) with the thin provisioned created filesytem. Can't even mount it or online it. Was this just a demo of things to come or is thin provision ready for testing in zfs? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS limits on zpool snapshots
As far as I know the recalled on paper number of snapshots you can have in a filesystem is 2^48. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and thin provisioning
I found this article (http://www.cuddletech.com/blog/pivot/entry.php?id=729) but I have 2 questions. I am trying the steps on Opensolaris build 54. Since you create the filesystem with newfs, isn't that really a ufs filesystem running on top of zfs? In this case, yes. I wonder if you could create a second zfs pool on the volume. (Starting such pools at boot time might be problematic though!). The idea is that you have sparse raw storage available to you. The example placed a UFS filesystem on it, but you could do otherwise. Also I haven't been able to do anything in the normal fashion (ie zfs and zpool commands) with the thin provisioned created filesytem. Can't even mount it or online it. No. It's not a filesystem. It's a zvol or raw volume of blocks. Was this just a demo of things to come or is thin provision ready for testing in zfs? It's in there. Did you create the volumes as it shows (with the -s)? -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and thin provisioning
In this case, yes. I wonder if you could create a second zfs pool on the volume. (Starting such pools at boot time might be problematic though!). The idea is that you have sparse raw storage available to you. The example placed a UFS filesystem on it, but you could do otherwise. Followup to myself. Okay, it's just wrong, but it appears to work :-) # zfs list NAME USED AVAIL REFER MOUNTPOINT tank 1.40M 8.24G 25.5K /tank tank/test 24.5K 8.24G 24.5K /tank/test tank/zvol11.28M 8.24G 1.28M - vtank 76K 1.91T 24.5K /vtank vtank is a zpool on top of tank/zvol1 NAMESTATE READ WRITE CKSUM vtank ONLINE 0 0 0 /dev/zvol/dsk/tank/zvol1 ONLINE 0 0 0 # cp /usr/sbin/xntpdc /vtank # zfs list vtank NAME USED AVAIL REFER MOUNTPOINT vtank 178K 1.91T 126K /vtank I don't think I want to reboot it in this state, though. :-) -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: What SATA controllers are people using for ZFS?
On 2/1/07, Al Hopper [EMAIL PROTECTED] wrote: On Thu, 1 Feb 2007, Tom Buskey wrote: [i] I got an Addonics eSata card. Sata 3.0. PCI *or* PCI-X. Works right off the bat w/ 10u3. No firmware update needed. It was $130. But I don't pull out my hair and I can use it if I upgrade my server for pci-x [/i] And I'm finding the throughput isn't there. 2MB/s in ZFS RAIDZ and worse with UFS. *sigh* I think that there are big issues with the 3124 driver. I saw unexplained pauses that lasted from 30 to 80+ Seconds during a tar from a single SATA disk drive that I was migrating data from (using a Syba SD-SATA2-2E2I card). I fully expected the kernel to crash while observing this transfer (it did'nt). It happened periodically - each time a certain amount of data had been transferred (just by observation - not measurement). And this was a UFS filesystem and the drive is a Sun original drive from an Ultra 20 box. I need to do some followup experiments as Mike Riley (Sun) has kindly offered to take my results to the people working on this driver. So, anyone know an inexpensive 4 port SATA card for PCI that'll work with 10u3 and I don't need to reflash the BIOS on? (I bricked a Syba...) Honestly, you're much better off with the $125 8-port SuperMicro board that I have been unable to break to date. Details: SuperMicro AOC-SAT2-MV8 8-port - uses the Rev C0 (Hercules-2) chip: http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm Kudos to the Sun developers working the Marvell driver! :) In the meantime I hope to find time to test a SAS2041E-R (initially the PCI Express version of this card). We switched away from those same Marvell cards because of unexplained disconnects/reconnects that ZFS/Solaris would not survive from. Stability for us came from embracing the Sil3124-2's (Tekram). We had two marvell based systems, and the most stable are the now discontinued SATA-I adaptec 16 port cards, and Sil3124s. I think its redundant, but the state of SATA support here is still the most glaring weakness. Isolating this all to a SCSI-to-SATA external chassis is the surest route to bliss. Keep posting to zfs-discuss! :) Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS vs NFS vs array caches, revisited
I had followed with interest the turn off NV cache flushing thread, in regard to doing ZFS-backed NFS on our low-end Hitachi array: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg05000.html In short, if you have non-volatile cache, you can configure the array to ignore the ZFS cache-flush requests. This is reported to improve the really terrible performance of ZFS-backed NFS systems. Feel free to correct me if I'm misremembering Anyway, I've also read that if ZFS notices it's using slices instead of whole disks, it will not enable/use the write cache. So I thought I'd be clever and configure a ZFS pool on our array with a slice of a LUN instead of the whole LUN, and fool ZFS into not issuing cache-flushes, rather than having to change config of the array itself. Unfortunately, it didn't make a bit of difference in my little NFS benchmark, namely extracting a small 7.6MB tar file (C++ source code, 500 files/dirs). I used three test zpools and a UFS filesystem (not all were in play at the same time): pool: bulk_sp1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM bulk_sp1 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0 ONLINE 0 0 0 errors: No known data errors pool: bulk_sp1s state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM bulk_sp1s ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s0 ONLINE 0 0 0 errors: No known data errors pool: int01 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM int01 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s5 ONLINE 0 0 0 c0t1d0s5 ONLINE 0 0 0 errors: No known data errors # prtvtoc -s /dev/rdsk/c6t4849544143484920443630303133323230303230d0 * First SectorLast * Partition Tag FlagsSector CountSector Mount Directory 0 400 34 4294879232 4294879265 1 400 4294879266 67517 4294946782 8 1100 4294946783 16384 4294963166 # Both NFS client and server are Sun T2000's, 16GB RAM, switched gigabit ethernet, Solaris-10U3 patched as of 12-Jan-2007, doing nothing else at the time of the tests. The bulk_sp1* pools were both on the same Hitachi 9520V RAID-5 SATA group that I ran my bonnie++ tests on yesterday. The int01 pool is mirrored on two slice-5's of the server T2000's internal 2.5 SAS 73GB drives. ZFS on whole-disk FC-SATA LUN via NFS: real 968.13 user 0.33 sys 0.04 7.9 KB/sec overall ZFS on partial slice-0 of FC-SATA LUN via NFS: real 950.77 user 0.33 sys 0.04 8.0 KB/sec overall ZFS on slice-5 mirror of internal SAS drives via NFS: real 17.48 user 0.32 sys 0.03 438.8 KB/sec overall UFS on partial slice-0 of FC-SATA LUN via NFS: real 6.13 user 0.32 sys 0.03 1251.4 KB/sec overall I'm not willing to disable the ZIL. I think I'd settle for the 400KB/sec range in this test from NFS on ZFS, if I could get that on our FC-SATA Hitachi array. As things are now, ZFS just won't work for us, and I'm not sure how to make it go faster. Thoughts suggestions are welcome Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS checksums - block or file level
Richard Elling wrote: Neil Perrin wrote: ZFS checksums are at the block level. This has been causing some confusion lately, so perhaps we could say: ZFS checksums are at the file system block level, not to be confused with the disk block level or transport block level. Saying that ZFS checksums are at the file system block level is also confusing since zvols have checksums too. May be it is better to say that ZFS checksums are at the zpool block level because zpool is the place where all blocks either from file system or zvol are stored. Victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss