Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
I'm not sure about others on the list, but I have a dislike of AC power bricks in my racks. I definitely empathize with your position concerning AC power bricks, but until the perfect battery is created, and we are far from it, it comes down to tradeoffs. I personally believe the ignition risk, thermal wear-out, and the inflexible proprietary nature of Li-Ion solutions simply outweigh the benefits of internal or all inclusive mounting for enterprise bound NVRAM. Is the state of the power input exposed to software in some way? In other terms, can I have a nagios check running on my server that triggers an alert if the power cable accidentally gets pulled out? Absolutely, the X1 monitors the external supply and can detect not only a disconnect but any loss of power. In all cases, the card throws an interrupt so that the device driver (and ultimately user space) can be immediately notified. The X1 does not rely on external power until the host power drops below a certain threshold, so attaching/detaching the external power cable has no effect on data integrity as long as the host is powered on. OK, which means that the UPS must be separate to the UPS powering the server then. Correct, a dedicated (in this case redundant) UPS is expected. Any plans on a pci-e multi-lane version then? Not at this time. In addition to the reduced power and thermal output, the PCIe x1 connector has the added benefit of not competing with other HBA's which do require a x4 or x8 PCIe connection. Very appreciative of the feedback! Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do separate ZFS filesystems affect performance?
Gary Mills writes: On Tue, Jan 12, 2010 at 01:56:57PM -0800, Richard Elling wrote: On Jan 12, 2010, at 12:37 PM, Gary Mills wrote: On Tue, Jan 12, 2010 at 11:11:36AM -0600, Bob Friesenhahn wrote: On Tue, 12 Jan 2010, Gary Mills wrote: Is moving the databases (IMAP metadata) to a separate ZFS filesystem likely to improve performance? I've heard that this is important, but I'm not clear why this is. I found a couple of references that suggest just putting the databases on their own ZFS filesystem has a great benefit. One is an e-mail message to a mailing list from Vincent Fox at UC Davis. They run a similar system to ours at that site. He says: Particularly the database is important to get it's own filesystem so that it's queue/cache are separated. Another policy you might consider is the recordsize for the database vs the message store. In general, databases like the recordsize to match. Of course, recordsize is a per-dataset parameter. Unfortunately, it's not a single database. There are many of them, of different types. One is a Berkeley DB, others are something specific to the IMAP server (called skiplist), and some are small flat files that are just rewritten. All they have in common is activity and frequent locking. They can be relocated as a whole. The second one is from: http://blogs.sun.com/roch/entry/the_dynamics_of_zfs He says: For file modification that come with some immediate data integrity constraint (O_DSYNC, fsync etc.) ZFS manages a per-filesystem intent log or ZIL. This sounds like the ZIL queue mentioned above. Is I/O for each of those handled separately? ZIL is for the pool. Yes, I understand that, but do filesystems have separate queues of any sort within the ZIL? If not, would it help to put the database filesystems into a separate zpool? The slog device is for the pool but the ZIL is per filesystem/dataset. The logbias property can be used on a dataset to prevent that set from consuming the slog device resource : http://blogs.sun.com/roch/entry/synchronous_write_bias_property -r We did some experiments with the messaging server and a RAID array with separate logs. As expected, it didn't make much difference because of the nice, large nonvolatile write cache on the array. This reinforces the notion that Dan Carosone also recently noted: performance gains for separate logs are possible when the latency of the separate log device is much lower than the latency of the devices in the main pool, and, of course, the workload uses sync writes. It certainly sounds as if latency is the key for synchronous writes. -- -Gary Mills--Unix Group--Computer and Network Services- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] adpu320 scsi timeouts only with ZFS
Any news regarding this issue? I'm having the same problems. I'm using an external Axus SCSI enclosure (Yotta with 16 drives) and it timed out on scanning LUNs (16 of them b/c Yotta is configured as JBOD). I've performed firmware upgrade to the Yotta system and now the scanning works, the pool build works, I'm even able to perform some transfers but after some time everything hangs, iostat shows 100% busy on drives and timeouts in dmesg. Still, if I keep the LUN number to =8 everything works correctly. I think that eliminates the problems of cabling or any other hardware involved. All the problems start when using 8 LUNs (sd.conf is configured correctly, adpu320.conf is configured with maximum 16 LUNs). Linux kernel 2.6.18 had the same problems in the past, but current kernel works without any problem, so they've changed something in the driver which eliminated the issue. If any extra info is needed I'm ready to provide it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do separate ZFS filesystems affect performance?
On Thu, Jan 14, 2010 at 10:58:48AM +1100, Daniel Carosone wrote: On Wed, Jan 13, 2010 at 08:21:13AM -0600, Gary Mills wrote: Yes, I understand that, but do filesystems have separate queues of any sort within the ZIL? I'm not sure. If you can experiment and measure a benefit, understanding the reasons is helpful but secondary. If you can't experiment so easily, you're stuck asking questions, as now, to see whether the effort of experimenting is potentially worthwhile. Yes, we're stuck asking questions. I appreciate your responses. Some other things to note (not necessarily arguments for or against): * you can have multiple slog devices, in case you're creating so much ZIL traffic that ZIL queueing is a real problem, however shared or structured between filesystems. For the time being, I'd like to stay with the ZIL that's internal to the zpool. * separate filesystems can have different properties which might help tuning and experiments (logbias, copies, compress, *cache), as well the recordsize. Maybe you will find that compress on mailboxes helps, as long as you're not also compressing the db's? Yes, that's a good point in favour of a separate filesystem. * separate filesystems may have different recovery requirements (snapshot cycles). Note that taking snapshots is ~free, but keeping them and deleting them have costs over time. Perhaps you can save some of these costs if the db's are throwaway/rebuildable. Also a good point. If not, would it help to put the database filesystems into a separate zpool? Maybe, if you have the extra devices - but you need to compare with the potential benefit of adding those devices (and their IOPS) to benefit all users of the existing pool. For example, if the databases are a distinctly different enough load, you could compare putting them on a dedicated pool on ssd, vs using those ssd's as additional slog/l2arc. Unless you can make quite categorical separations between the workloads, such that an unbalanced configuration matches an unbalanced workload, you may still be better with consolidated IO capacity in the one pool. As well, I'd like to keep all of the ZFS pools on the same external storage device. This makes migrating to a different server quite easy. Note, also, you can only take recursive atomic snapshots within the one pool - this might be important if the db's have to match the mailbox state exactly, for recovery. That's another good point. It's certainly better to have synchronized snapshots. -- -Gary Mills--Unix Group--Computer and Network Services- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] 4 Internal Disk Configuration
Hello, I have played with ZFS but not deployed any production systems using ZFS and would like some opinions I have a T-series box with 4 internal drives and would like to deploy ZFS with availability and performance in mind ;) What would some recommended configurations be? Example: use internal RAID controller to mirror boot drives, and ZFS the other 2? Can I create one pool with the 3 or 4 drives, install Solaris, and use this pool for other apps? Also, what happens if a drive fails? Thanks for any tips and gotchas. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 4 Internal Disk Configuration
On Thu, January 14, 2010 09:44, Mr. T Doodle wrote: I have played with ZFS but not deployed any production systems using ZFS and would like some opinions Opinions I've got :-). Nor am I at all unusual in that regard, on this list :-) :-). I have a T-series box with 4 internal drives and would like to deploy ZFS with availability and performance in mind ;) What would some recommended configurations be? Example: use internal RAID controller to mirror boot drives, and ZFS the other 2? We haven't discussed configurations this small much lately, but I'm sure people will have ideas. And there isn't enough there to really give you many options, unfortunately. Lots of people think that ZFS does better than hardware controllers (at keeping the data valid). Modern OpenSolaris will install a ZFS pool and put the system filesystems in it (and use snapshots and such to manage upgrades, too). And you can then manually attach a second disk for redundancy, for that availability goal. However, you can't use a RAIDZ vdev, only a single device or a mirror. So you can't put all the disks into one ZFS pool and boot off it and serve the rest of the space out for other uses. (Boot code has to support the parts of ZFS that you can configure it to boot from, so it's a quite restricted subset). You could eat two disks for redundant boot pool, and then have the other two left to share out (presumably as a mirror vdev in a zpool), but that wastes a high percentage of your disk (1 drive usable out of 4 physical). You could have a non-redundant boot disk and then make a three-disk RAIDZ pool to share out, but of course that takes the server down if the one boot disk fails. Can I create one pool with the 3 or 4 drives, install Solaris, and use this pool for other apps? Nope, that's the thing you can't do. Also, what happens if a drive fails? Depends on the kinds of vdevs; if there's redundancy (mirror or RAIDZ[123]), you can replace the bad drive, resilver, and keep running. If these aren't hot-swap drives, you'll have to shut down to make the physical switch. If you want availability, you should choose vdev types with redundancy of course. Thanks for any tips and gotchas. One interesting possibility, which I haven't worked with but which sounds good, for this kind of low-end server, is to set the system up to boot from a USB key rather than the disks. This is slower, but system disk access isn't very frequent. And instead of real redundancy on the boot drive (with auto-failover), just keep another copy of the key, and plug that one in instead if the first one fails. Then you could put all four disks in a RAIDZ pool and share them out for use, with redundancy c. I'm used to the x86 side, not sure if the boot-from-USB thing is supported on a T-series box, either. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Lycom has lots of hardware that looks interesting, is it supported
I've been building a few 6disk boxes for VirtualBox servers, and I am also surveying how I will add more disks as these boxes need it. Looking around on the HCL, I see the Lycom PE-103 is supported. That's just 2 more disks, I'm typically going to want to add a raid-z w/spare to my zpools, so I need at least 4 disks, and I'd prefer to build boxes with multi-lane esata expansion and put either 5 or 10 disks in them for expansion. There are lots of devices on the lycom web site at http://www.lycom.com.tw. The device at http://www.lycom.com.tw/st126rm.htm looks very attractive for bolting on to computer cases that are housing additional drives. That device says that the PE-102 can be used for multi-lane connectivity. Is multi-lane working in solaris, and since the PE-102 seems to have the same chipset as the PE-103, would it work on OpenSolaris? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
On Thu, Jan 14, 2010 at 12:35:32AM -0800, Christopher George wrote: I'm not sure about others on the list, but I have a dislike of AC power bricks in my racks. I definitely empathize with your position concerning AC power bricks, but until the perfect battery is created, and we are far from it, it comes down to tradeoffs. I personally believe the ignition risk, thermal wear-out, and the inflexible proprietary nature of Li-Ion solutions simply outweigh the benefits of internal or all inclusive mounting for enterprise bound NVRAM. Is there any data out there that have tracked these sort of ignition incidents? I have to admit I'd never heard of this. We have quite a few BBU backed RAID controllers in our servers and I've never had anything remotely like this occur. I know anecdotal evidence is meaningless, but this definitely surprised me a little. My gut tells me the risk of this is pretty low and most are going to prefer the convenience of an onboard BBU to installing UPS'es in all their racks (as good a practice as that may be). Gut isn't the best to go on of course, which is why I'm interested in seeing some statistics on this sorta thing... Interesting product though! Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 4 Internal Disk Configuration
On Thu, Jan 14, 2010 at 3:44 PM, Mr. T Doodle tpsdoo...@gmail.com wrote: Hello, I have played with ZFS but not deployed any production systems using ZFS and would like some opinions I have a T-series box with 4 internal drives and would like to deploy ZFS with availability and performance in mind ;) What would some recommended configurations be? How long's a piece of string? I can tell you what my production systems look like: there's a small (24G or so) partition on s0, some swap, and then the rest of the space on s7. Then mirror the first 2 disks slice 0 using SVM (this configuration was devised before ZFS boot) for the OS; mirror slice 0 on the other two disks for an alternate root for Live Upgrade. Then create a couple of mirror vdevs using the remaining space. So SVM looks like: d10 -m d11 d12 1 d11 1 1 c1t2d0s0 d12 1 1 c1t3d0s0 d0 -m d1 d2 1 d1 1 1 c1t0d0s0 d2 1 1 c1t1d0s0 and ZFS looks like: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s7 ONLINE 0 0 0 c1t1d0s7 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0s7 ONLINE 0 0 0 c1t3d0s7 ONLINE 0 0 0 Example: use internal RAID controller to mirror boot drives, and ZFS the other 2? Can I create one pool with the 3 or 4 drives, install Solaris, and use this pool for other apps? Also, what happens if a drive fails? Swap it for a new one ;-) (somewhat more complex with the dual layout as I described it). -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
On Thu, Jan 14, 2010 at 11:35 AM, Christopher George cgeo...@ddrdrive.com wrote: I'm not sure about others on the list, but I have a dislike of AC power bricks in my racks. I definitely empathize with your position concerning AC power bricks, but until the perfect battery is created, and we are far from it, it comes down to tradeoffs. I personally believe the ignition risk, thermal wear-out, and the inflexible proprietary nature of Li-Ion solutions simply outweigh the benefits of internal or all inclusive mounting for enterprise bound NVRAM. That's kind of an overstatement. NVRAM backed by on-board LI-Ion batteries has been used in storage industry for years; I can easily point out a company that has shipped tens of thousands of such boards over last 10 years. Regards, Andrey Is the state of the power input exposed to software in some way? In other terms, can I have a nagios check running on my server that triggers an alert if the power cable accidentally gets pulled out? Absolutely, the X1 monitors the external supply and can detect not only a disconnect but any loss of power. In all cases, the card throws an interrupt so that the device driver (and ultimately user space) can be immediately notified. The X1 does not rely on external power until the host power drops below a certain threshold, so attaching/detaching the external power cable has no effect on data integrity as long as the host is powered on. OK, which means that the UPS must be separate to the UPS powering the server then. Correct, a dedicated (in this case redundant) UPS is expected. Any plans on a pci-e multi-lane version then? Not at this time. In addition to the reduced power and thermal output, the PCIe x1 connector has the added benefit of not competing with other HBA's which do require a x4 or x8 PCIe connection. Very appreciative of the feedback! Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] adpu320 scsi timeouts only with ZFS
Any news regarding this issue? I'm having the same problems. Me too. My v40z with U320 drives in the internal bay will lock up partway through a scrub. I backed the whole SCSI chain down to U160, but it seems a shame that U320 speeds can't be used. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] adpu320 scsi timeouts only with ZFS
I was frustrated with this problem for months. I've tried different disks, cables, even disk cabinets. The driver hasn't been updated in a long time. When the timeouts occurred, they would freeze for about a minute or two (showing the 100% busy). I even had the problem with less than 8 LUNs (and I'm seeing similar symptoms on another box connected to a hardware RAID). I even swapped out controllers (for the same type - 3 of them, all Adaptec). To fix it, I swapped out the Adaptec controller and put in LSI Logic and all the problems went away. On Jan 14, 2010, at 8:57 AM, Alexandru Pirvulescu wrote: Any news regarding this issue? I'm having the same problems. I'm using an external Axus SCSI enclosure (Yotta with 16 drives) and it timed out on scanning LUNs (16 of them b/c Yotta is configured as JBOD). I've performed firmware upgrade to the Yotta system and now the scanning works, the pool build works, I'm even able to perform some transfers but after some time everything hangs, iostat shows 100% busy on drives and timeouts in dmesg. Still, if I keep the LUN number to =8 everything works correctly. I think that eliminates the problems of cabling or any other hardware involved. All the problems start when using 8 LUNs (sd.conf is configured correctly, adpu320.conf is configured with maximum 16 LUNs). Linux kernel 2.6.18 had the same problems in the past, but current kernel works without any problem, so they've changed something in the driver which eliminated the issue. If any extra info is needed I'm ready to provide it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do separate ZFS filesystems affect performance?
On Jan 14, 2010, at 6:41 AM, Gary Mills wrote: On Thu, Jan 14, 2010 at 01:47:46AM -0800, Roch wrote: Gary Mills writes: Yes, I understand that, but do filesystems have separate queues of any sort within the ZIL? If not, would it help to put the database filesystems into a separate zpool? The slog device is for the pool but the ZIL is per filesystem/dataset. The logbias property can be used on a dataset to prevent that set from consuming the slog device resource : http://blogs.sun.com/roch/entry/synchronous_write_bias_property Ah, that's what I wanted to know. Thanks for the response. Roch, I think this can be misinterpreted, so perhaps more clarity is needed. If you have sync writes, they will be written to persistent storage before they are acknowledged. The only question is where they will be written: to the ZIL or pool? By default, this preference is based on the size of each I/O, with small I/Os written to the ZIL and large I/Os written to the pool. The dataset parameter logbias is used to set the ZIL vs pool preference. Thus, one could force all datasets, save one, to use the pool and permit the one, lucky dataset to use the ZIL (or vice versa) Separate log devices is an orthogonal issue. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
On Thu, 14 Jan 2010, Ray Van Dolson wrote: My gut tells me the risk of this is pretty low and most are going to prefer the convenience of an onboard BBU to installing UPS'es in all their racks (as good a practice as that may be). Other than the spontaneous combustion issue (which was heavily covered by the press a couple of years ago, and resulted in many product recalls), it seems that the main advantage of using external power is to avoid needing to shut down the server and open the chassis if a battery goes bad. Most servers are not designed to be serviced in this way while they remain running. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 4 Internal Disk Configuration
On Thu, 14 Jan 2010, Mr. T Doodle wrote: I have a T-series box with 4 internal drives and would like to deploy ZFS with availability and performance in mind ;) What would some recommended configurations be? Example: use internal RAID controller to mirror boot drives, and ZFS the other 2? Can I create one pool with the 3 or 4 drives, install Solaris, and use this pool for other apps? Also, what happens if a drive fails? Peter Tribble's approach is nice, but old fashioned. By partitioning the first two drives, you can arrange to have a small zfs-boot mirrored pool on the first two drives, and then create a second pool as two mirror pairs, or four drives in a raidz to support your data. The root pool needs to be large enough to deal with whatever you plan to throw at it, such as multiple boot environments via live upgrade and backout patches. It will steal a bit of space from the other drives but this is not usually much cost given today's large drives. You can use zfs for all of this. It is not necessary to use something antique like SVM. The other approach (already suggested by someone else) is to figure out how to add another device just for the root pool. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] opensolaris-vmware
I have been recommended by several other users on this mailing list to use inside the vm snapshots, vmware snapshots, and then use zfs snapshots. I believe I understand the difference between filesystem snapshots vs block level snapshots, however since I cannot use vmware snapshots (all LUNs on the SAN are mapped to ESXi using RAW disk in physical compatibility mode, which then disables vmware snapshots) does this cause me to have a weaker backup strategy? What else can I do? Should I convert the virtual machines from physical compatibility to virtual compatibility in order to get snapshotting on the ESXi server? Thanks for all the helpful information! Greg On Wed, Jan 13, 2010 at 9:12 PM, Gregory Durham gregory.dur...@gmail.comwrote: Haha, Yeah that's tomorrow, I have a test vm I will be testing on. I shall report back! Thank you all! On Wed, Jan 13, 2010 at 8:26 PM, Fajar A. Nugraha fa...@fajar.net wrote: On Thu, Jan 14, 2010 at 6:40 AM, Gregory Durham gregory.dur...@gmail.com wrote: Arnaud, The virtual machines coming up as if they were on is the least of my worries, my biggest worry is keeping the filesystems of the vms alive i.e. not corrupt. As Tim said, The snapshot disk are in the same state they would be in if you pulled the power plug. This is also the same thing you got BTW if you use LVM snapshot (on Linux) or SAN/NAS based snapshots (like NetApp) In the case of exchange, I have exchange itself on a raw lun in physical compatibility mode, and I have 2 LUNs mounted with the Server 2008 iSCSI initiator for logs and the exchange DB. Most modern filesystem and database have journaling that can recover from power failure scenarios, so they should be able to use the snapshot and provide consistent, non-corrupt information. So the question now is, have you tried restoring from snapshot? -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 4 Internal Disk Configuration
By partitioning the first two drives, you can arrange to have a small zfs-boot mirrored pool on the first two drives, and then create a second pool as two mirror pairs, or four drives in a raidz to support your data. agreed.. 2 % zpool iostat -v capacity operationsbandwidth pool used avail read write read write - - - - - - r 8.34G 21.9G 0 5 1.62K 17.0K mirror 8.34G 21.9G 0 5 1.62K 17.0K c5t0d0s0 - - 0 2 3.30K 17.2K c5t1d0s0 - - 0 2 3.66K 17.2K - - - - - - z 375G 355G 6 32 67.2K 202K mirror 133G 133G 2 14 24.7K 84.2K c5t0d0s7 - - 0 3 53.3K 84.3K c5t1d0s7 - - 0 3 53.2K 84.3K mirror 120G 112G 1 9 21.3K 59.6K c5t2d0- - 0 2 38.4K 59.7K c5t3d0- - 0 2 38.2K 59.7K mirror 123G 109G 1 8 21.3K 58.6K c5t4d0- - 0 2 36.4K 58.7K c5t5d0- - 0 2 37.2K 58.7K - - - - - - ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
Is there any data out there that have tracked these sort of ignition incidents? I have to admit I'd never heard of this. We have quite a few BBU backed RAID controllers in our servers and I've never had anything remotely like this occur. I know anecdotal evidence is meaningless, but this definitely surprised me a little. I agree, it would be very informative if RAID HBA vendors would publish failure statistics of their Li-Ion based BBU products. My gut tells me the risk of this is pretty low and most are going to prefer the convenience of an onboard BBU to installing UPS'es in all their racks (as good a practice as that may be). Again I agree, I am not recommending, nor did I mean to allude, that to be the proper and/or preferred solution for RAID controllers. To my knowledge, the mAh requirements of a DDRdrive X1 class product cannot be supported by any of the BBUs currently found on RAID controllers. It would require either a substantial increase in energy density or a decrease in packaging volume both of which incur additional risks. Interesting product though! Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] PCI-E CF adapter?
I know this is slightly OT but folks discuss zfs compatible hardware here all the time. :) Has anyone used something like this combination? http://www.cdw.com/shop/products/default.aspx?EDC=1346664 http://www.cdw.com/shop/products/default.aspx?EDC=1854700 It'd be nice to have externally accessible CF slots for my NAS. I can't put them into a drive bay because it's Sun hardware and even if not, any hardware I'd use for a NAS would have hot plug drives -- the type of SATA adapter that would fit into a 5-1/4 bay won't fit properly in a hot plug type slot. I want to boot from CF and save all the drive slots for actual storage. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SMC for ZFS administration in OpenSolaris 2009.06?
For Sun's sake they cannot go browser only. Department of Defense does not allow users to install browsers. Ted Jordan - Funuation -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do separate ZFS filesystems affect performance?
additional clarification ... On Jan 14, 2010, at 8:49 AM, Richard Elling wrote: On Jan 14, 2010, at 6:41 AM, Gary Mills wrote: On Thu, Jan 14, 2010 at 01:47:46AM -0800, Roch wrote: Gary Mills writes: Yes, I understand that, but do filesystems have separate queues of any sort within the ZIL? If not, would it help to put the database filesystems into a separate zpool? The slog device is for the pool but the ZIL is per filesystem/dataset. The logbias property can be used on a dataset to prevent that set from consuming the slog device resource : http://blogs.sun.com/roch/entry/synchronous_write_bias_property Ah, that's what I wanted to know. Thanks for the response. Roch, I think this can be misinterpreted, so perhaps more clarity is needed. If you have sync writes, they will be written to persistent storage before they are acknowledged. The only question is where they will be written: to the ZIL or pool? By default, this preference is based on the size of each I/O, with small I/Os written to the ZIL and large I/Os written to the pool. The dataset parameter logbias is used to set the ZIL vs pool preference. Thus, one could force all datasets, save one, to use the pool and permit the one, lucky dataset to use the ZIL (or vice versa) Should read: Thus, one could force all datasets, save one, to use the pool and permit the one, lucky dataset to use the ZIL (or vice versa) for large I/Os. -- richard Separate log devices is an orthogonal issue. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SMC for ZFS administration in OpenSolaris 2009.06?
ted jordan wrote: For Sun's sake they cannot go browser only. Department of Defense does not allow users to install browsers. They may not allow the end user to install a browser but there are lots of deployments of Solaris systems in the US DoD where the operating system provided browser (firefox) is used for administration and for access to web based applications. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SMC for ZFS administration in OpenSolaris 2009.06?
On Thu, Jan 14, 2010 at 12:11 PM, ted jordan t...@funutation.com wrote: For Sun's sake they cannot go browser only. Department of Defense does not allow users to install browsers. Ted Jordan - Funuation -- I don't follow. Why would adding a web based management utility force it to be web-only? The 7000 series has a VERY robust web GUI, but they also have a CLI that provides all of the same functions. Why would a web GUI cause sun to go browser-only? -- --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZIL to disk
Hi all, Are there any recommendations regarding min IOPS the backing storage pool needs to have when flushing the SSD ZIL to the pool? Consider a pool of 3x 2TB SATA disks in RAIZ1, you would roughly have 80 IOPS. Any info about the relation between ZIL pool performance? Or will the ZIL simply fill up and performance drops to pool speed? BR, Jeffry ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
That's kind of an overstatement. NVRAM backed by on-board LI-Ion batteries has been used in storage industry for years; Respectfully, I stand by my three points of Li-Ion batteries as they relate to enterprise class NVRAM: ignition risk, thermal wear-out, and proprietary design. As a prior post stated, there is a dearth of published failure statistics of Li-Ion based BBUs. I can easily point out a company that has shipped tens of thousands of such boards over last 10 years. No argument here, I would venture the risks for consumer based Li-Ion based products did not become apparent or commonly accepted until the user base grew several orders of magnitude greater than tens of thousands. For the record, I agree there is a marked convenience with an integrated high energy Li-Ion battery solution - but at what cost? We chose an external solution because it is a proven and industry standard method of enterprise class data backup. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL to disk
On Thu, 14 Jan 2010, Jeffry Molanus wrote: Are there any recommendations regarding min IOPS the backing storage pool needs to have when flushing the SSD ZIL to the pool? Consider a pool of 3x 2TB SATA disks in RAIZ1, you would roughly have 80 IOPS. Any info about the relation between ZIL pool performance? Or will the ZIL simply fill up and performance drops to pool speed? There are different kinds of IOPS. The expensive ones are random IOPS whereas sequential IOPS are much more efficient. The intention of the SSD-based ZIL is to defer the physical write so that would-be random IOPS can be converted to sequential scheduled IOPS like a normal write. ZFS coalesces multiple individual writes into larger sequential requests for the disk. Regardless, some random access to the underlying disks is still required. If the pool becomes close to full (or has become fragmented due to past activities) then there will be much more random access and the SSD-based ZIL will not be as effective. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2-way Mirror With Spare
On Jan 14, 2010, at 11:09 AM, Mr. T Doodle wrote: I am considering RAIDZ or a 2-way mirror with a spare. I have 6 disks and would like the best possible performance and reliability and not really concerned with disk space. My thought was a 2 disk 2-way mirror with a spare. Would this be the best in terms of performance? A six-way mirror will give you the best read performance and data retention :-) You can trade data retention for write performance by a stripe of 2, 3-way mirrors or 3, 2-way mirrors. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL to disk
There are different kinds of IOPS. The expensive ones are random IOPS whereas sequential IOPS are much more efficient. The intention of the SSD-based ZIL is to defer the physical write so that would-be random IOPS can be converted to sequential scheduled IOPS like a normal write. ZFS coalesces multiple individual writes into larger sequential requests for the disk. Yes I understand; but still isn't there a upperbond? If I would have the perfect synchronous ZIL load; and I would only have on large RAIDZ2 vdev in a single pool with 10TB, how would the system behave when it flushes the ZIL content to disk? Regardless, some random access to the underlying disks is still required. If the pool becomes close to full (or has become fragmented due to past activities) then there will be much more random access and the SSD-based ZIL will not be as effective. Yes, I understand what you are saying but its more out of general interest what the relation is to the SSD devices vs. required (sequential) write bandwidth/IOPS. I can hardly imagine that there isn't one. Jeffry ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
cg == Christopher George cgeo...@ddrdrive.com writes: cg I agree, it would be very informative if RAID HBA vendors cg would publish failure statistics of their Li-Ion based BBU cg products. If they haven't, then on what are you basing your decision *not* to use one? Just the random thought that they might fail? cg inflexible proprietary nature of Li-Ion You can get complete systems with charging microcontroller and battery without any undue encumbrances I can detect on sparkfun.com. What's ``proprietary'' mean in this context? cg the ignition risk, thermal wear-out, and the inflexible cg proprietary nature of Li-Ion solutions simply outweigh the cg benefits of internal or all inclusive mounting for enterprise cg bound NVRAM. well...for *HOME* use based on the failure modes I've observed I'd prefer to keep the battery next to the SDRAM like ACARD and LSI do. for the enterprise, someone should warn netapp/hitachi/emc/storagetek who are presumably Lion based nvram users. One thing on which I can agree: if the vendor has used Lion it's hard to tell if the implementation is proper, ex whether it will warn of an aged battery without enough capacity. For slog, IMHO the ideal behavior would be: 1. weekly test-flushes to CF or USBstick or whatever is the NAND backing-store 2. the device should shut itself off, as if SATA cable were pulled, or in some other way ZFS detects instantly, if the battery's not got capacity left after the test flush completes. One way would be to require *two* consecutive successful test flushes each week. 3. there should be a button you can press to simulate the battery-failure-powerdown behavior, so you can test that ZFS and your controller respond properly. 4. ``redundant'' power should mean the device has (1) power from host, and (2) enough stored energy in the battery to do two consecutive flushes. Whenever the device does not have ``redundant'' power, it should: a. disable itself as in (3) b. flush SDRAM to NAND. This means, if the device's battery is exhausted, the system may boot with the device disconnected. The host will have to suport hotplug so the slog can come back after the battery charges. so, (2) is really a special case of (4). and AIUI Lion will last longer if you don't charge it to 100%. laptops usually want 100% because they compete on mAh/kg at initial purchase, but for this application charging to 70% should be fine which from what I heard will make them last a lot longer before crystalizing. cg can detect not only a disconnect but any loss of power. In cg all cases, the card throws an interrupt so that the device cg driver (and ultimately user space) can be immediately cg notified. We need to look at the overall system, though. Does a ZFS system using the card disable the slog when this happens? or does it just print a warning in dmesg and do nothing? When you're using a LSI BBU, the disks behind the controller have their write cahce disabled. so, if you evil-tune ZFS to skip issuing SYNC CACHE, but then the BBU dies and becomes write-through, the overall system is still safe (albeit slow). Also what you describe still doesn't seem to detect the failure case you brought up yourself, of a worn-out battery. UPS's do test their batteries, but ones with worn-out batteries enter bypass mode, they don't turn themselves off, which seems to be the only way your card would have to hear a warning. cg attaching/detaching the external power cable has no effect on cg data integrity as long as the host is powered on. In other words, as long as you don't trip over both cables at once. :( Does the device partially obey my (4) and immediately flush to NAND when the host is powered off? or does it keep the data in SDRAM only for as long as possible, until told to do otherwise by ``the user'' or something? pgptaCigYmnol.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
On Thu, Jan 14, 2010 at 10:02 PM, Christopher George cgeo...@ddrdrive.com wrote: That's kind of an overstatement. NVRAM backed by on-board LI-Ion batteries has been used in storage industry for years; Respectfully, I stand by my three points of Li-Ion batteries as they relate to enterprise class NVRAM: ignition risk, thermal wear-out, and proprietary design. As a prior post stated, there is a dearth of published failure statistics of Li-Ion based BBUs. Why not enlighten EMC/NTAP on this then? I can easily point out a company that has shipped tens of thousands of such boards over last 10 years. No argument here, I would venture the risks for consumer based Li-Ion based products did not become apparent or commonly accepted until the user base grew several orders of magnitude greater than tens of thousands. For the record, I agree there is a marked convenience with an integrated high energy Li-Ion battery solution - but at what cost? Um, with Li-Ion battery in each and every of a billions of cell phones out there ... We chose an external solution because it is a proven and industry standard method of enterprise class data backup. Could you please elaborate on the last statement, provided you meant anything beyond UPS is a power-backup standard? Regards, Andrey Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Invalid zpool argument in Solaris 10 (10/09)
Hello List, I am porting a block device driver(for a PCIe NAND flash disk driver) from OpenSolaris to Solaris 10. On Solaris 10 (10/09) I'm having an issues creating a zpool with the disk. Apparently I have an 'invalid argument' somewhere: % pfexec zpool create mypool c4d0p0 cannot create 'mypool': invalid argument for this pool operation Is there any way to enable debug in zpool so I can discover what the invalid argument is? I do see the following IOCTLs issued to my device when trying to create the zpool: Jan 14 14:35:48 dlx-a36 pflash: [ID 896041 kern.notice] DEBUG (TMS: pflash.c pflash_ioctl ln:1254 ) DKIOCINFO supported Jan 14 14:35:48 dlx-a36 pflash: [ID 896041 kern.notice] DEBUG (TMS: pflash.c pflash_ioctl ln:1227 ) DKIOCGEXTVTOC supported by CMLB Jan 14 14:35:48 dlx-a36 pflash: [ID 896041 kern.notice] DEBUG (TMS: pflash.c pflash_ioctl ln:1254 ) DKIOCINFO supported Jan 14 14:35:48 dlx-a36 pflash: [ID 896041 kern.notice] DEBUG (TMS: pflash.c pflash_ioctl ln:1227 ) DKIOCGEXTVTOC supported by CMLB Jan 14 14:35:48 dlx-a36 pflash: [ID 896041 kern.notice] DEBUG (TMS: pflash.c pflash_ioctl ln:1254 ) DKIOCINFO supported I've reviewed all the DKIO structures and I can't find anything that's wrong in my driver. This command does work with my OpenSolaris driver and the only difference between the two drivers is the CMLB version. Any help would be greatly appreciated. Let me know if I can provide anymore information. Thank you, Josh Morris josh.mor...@texmemsys.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fast mirror resync?
On Wed, Jan 13, 2010 at 04:38:42PM +0200, Cyril Plisko wrote: On Wed, Jan 13, 2010 at 4:35 PM, Max Levine max...@gmail.com wrote: Veritas has this feature called fast mirror resync where they have ?a DRL on each side of the mirror and, detaching/re-attaching a mirror causes only the changed bits to be re-synced. That's actually a DCO. The DRL is for crash recovery. Is anything similar planned for ZFS? ZFS has that feature from moment zero. I was under the impression that the ZFS fast replay only held for mirrors where one part is missing and not where you actually administratively detach it or where the mirror is used on another host (a la 'zpool split'). -- Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Invalid zpool argument in Solaris 10 (10/09)
On Thu, 14 Jan 2010, Josh Morris wrote: Hello List, I am porting a block device driver(for a PCIe NAND flash disk driver) from OpenSolaris to Solaris 10. On Solaris 10 (10/09) I'm having an issues creating a zpool with the disk. Apparently I have an 'invalid argument' somewhere: % pfexec zpool create mypool c4d0p0 Instead of p0, can you use format to create an s0 that takes up the whole partition? Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Invalid zpool argument in Solaris 10 (10/09) - Email found in subject
Hello Mark I created s0 as you suggested: partition print Current partition table (unnamed): Total disk cylinders available: 54823 + 2 (reserved cylinders) Part TagFlag Cylinders SizeBlocks 0usrwm 3 - 54822 419.94GB(54820/0/0) 880683300 1 unassignedwm 00 (0/0/0) 0 2 backupwu 0 - 54822 419.97GB(54823/0/0) 880731495 3 unassignedwm 00 (0/0/0) 0 4 unassignedwm 00 (0/0/0) 0 5 unassignedwm 00 (0/0/0) 0 6 unassignedwm 00 (0/0/0) 0 7 unassignedwm 00 (0/0/0) 0 8 bootwu 0 - 07.84MB(1/0/0) 16065 9 alternateswm 1 - 2 15.69MB(2/0/0) 32130 I still get the 'invalid argument': % pfexec zpool create mypool c4d0s0 cannot create 'mypool': invalid argument for this pool operation There were a lot more IOCTLs issued this time and a few accesses with extremely large addresses: WARNING: ERROR (TMS: pflash.c pflash_strategy ln:996 ) Access past the end of device! (18014398509480448 = 880791552) WARNING: ERROR (TMS: pflash.c pflash_strategy ln:996 ) Access past the end of device! (18014398509480960 = 880791552) WARNING: ERROR (TMS: pflash.c pflash_strategy ln:996 ) Access past the end of device! (18014398509480448 = 880791552) WARNING: ERROR (TMS: pflash.c pflash_strategy ln:996 ) Access past the end of device! (18014398509480960 = 880791552) Also a CDROMREADOFFSET IOCTL was issued to the device. This seems weird since my driver returns DKC_MD for the ctype when issued DKIOCINFO. Could my driver somehow be advertising the disk as a CDROM device? Thanks, Josh On 01/14/2010 03:20 PM, Mark J Musante wrote: On Thu, 14 Jan 2010, Josh Morris wrote: Hello List, I am porting a block device driver(for a PCIe NAND flash disk driver) from OpenSolaris to Solaris 10. On Solaris 10 (10/09) I'm having an issues creating a zpool with the disk. Apparently I have an 'invalid argument' somewhere: % pfexec zpool create mypool c4d0p0 Instead of p0, can you use format to create an s0 that takes up the whole partition? Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2-way Mirror With Spare
On Thu, Jan 14, 2010 at 11:31:25AM -0800, Richard Elling wrote: On Jan 14, 2010, at 11:09 AM, Mr. T Doodle wrote: I am considering RAIDZ or a 2-way mirror with a spare. I have 6 disks and would like the best possible performance and reliability and not really concerned with disk space. My thought was a 2 disk 2-way mirror with a spare. Would this be the best in terms of performance? A six-way mirror will give you the best read performance and data retention :-) You can trade data retention for write performance by a stripe of 2, 3-way mirrors or 3, 2-way mirrors. -- richard I vote for 2 3-way mirrors. That way you can afford to lose two devices in each vdev before losing your data. Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fast mirror resync?
On January 14, 2010 8:58:51 PM + A Darren Dunham ddun...@taos.com wrote: I was under the impression that the ZFS fast replay only held for mirrors where one part is missing and not where you actually administratively detach it or where the mirror is used on another host (a la 'zpool split'). That's my experience. I wish zfs had that feature. Pretty sure (IIRC) SVM has it with offline/online. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
On Thu, January 14, 2010 14:34, Miles Nordin wrote: cg == Christopher George cgeo...@ddrdrive.com writes: cg inflexible proprietary nature of Li-Ion You can get complete systems with charging microcontroller and battery without any undue encumbrances I can detect on sparkfun.com. What's ``proprietary'' mean in this context? I'm pretty sure the insides of my camera batteries and such are standard parts at that level; but it doesn't do me any good. I don't know how to find the sources for those parts, be sure I'm right, and fake up replacements. (Also there are serviceability issues if I mess about inside and later need manufacturer service.) So I'm stuck paying for expensive proprietary batteries. When I look at something I'm buying for serious use, I consider that sort of thing. If it doesn't say something like Takes a Foobar (or equivalent) mumble-76-x3 (or if I can't find sources for what it says it takes), that's a point against it. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
Why not enlighten EMC/NTAP on this then? On the basic chemistry and possible failure characteristics of Li-Ion batteries? I will agree, if I had system level control as in either example, one could definitely help mitigate said risks compared to selling a card based product where I have very little control over the thermal envelopes I am subjected. Could you please elaborate on the last statement, provided you meant anything beyond UPS is a power-backup standard? Although, I do think the discourse is healthy and relevant. At this point, I am comfortable to agree to disagree. I respect your point of view, and do agree strongly that Li-Ion batteries play a critical and highly valued role in many industries. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fast mirror resync?
fc == Frank Cusack fcus...@fcusack.com writes: fc That's my experience. I wish zfs had that feature. Pretty fc sure (IIRC) SVM has it with offline/online. zpool offline / zpool online of a mirror component will indeed fast-resync, and I do it all the time. zpool detach / attach will not. no need to guess, just try it. pgptNYDsxF2NJ.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 4 Internal Disk Configuration
On Jan 14, 2010, at 10:44 AM, Mr. T Doodle tpsdoo...@gmail.com wrote: Hello, I have played with ZFS but not deployed any production systems using ZFS and would like some opinions I have a T-series box with 4 internal drives and would like to deploy ZFS with availability and performance in mind ;) What would some recommended configurations be? Example: use internal RAID controller to mirror boot drives, and ZFS the other 2? Can I create one pool with the 3 or 4 drives, install Solaris, and use this pool for other apps? Also, what happens if a drive fails? Thanks for any tips and gotchas. Here's my .02 Have two small disks for rpool mirror and 2 large disks for your data pool mirror. Raidz will only give you IOPS of a single disk, so why not mirror? You have lots of memory for ARC to read cache and you should get the same performance and redundancy as a raidz. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
On Jan 14, 2010, at 11:02 AM, Christopher George wrote: That's kind of an overstatement. NVRAM backed by on-board LI-Ion batteries has been used in storage industry for years; Respectfully, I stand by my three points of Li-Ion batteries as they relate to enterprise class NVRAM: ignition risk, thermal wear-out, and proprietary design. As a prior post stated, there is a dearth of published failure statistics of Li-Ion based BBUs. I can easily point out a company that has shipped tens of thousands of such boards over last 10 years. No argument here, I would venture the risks for consumer based Li-Ion based products did not become apparent or commonly accepted until the user base grew several orders of magnitude greater than tens of thousands. For the record, I agree there is a marked convenience with an integrated high energy Li-Ion battery solution - but at what cost? I see nothing in the design that precludes a customer from using a Li-Ion battery, if they so desire. Perhaps the collective has forgotten that DC power is one of the simplest and most widespread interfaces around? :-) So it boils down to packaging. I personally dislike having batteries all over the place, and I've seen dozens of customers who never pay attention to the battery status on their systems. However, for future design considerations, an optional internal energy mount can keep the wolves at bay. -- richard We chose an external solution because it is a proven and industry standard method of enterprise class data backup. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
Very interesting product indeed! Given the volume one of these cards take up inside the server though, I couldn't help but think that 4GB is a bit on the low side. Alex. On Wed, Jan 13, 2010 at 5:51 PM, Christopher George cgeo...@ddrdrive.com wrote: The DDRdrive X1 OpenSolaris device driver is now complete, please join us in our first-ever ZFS Intent Log (ZIL) beta test program. A select number of X1s are available for loan, preferred candidates would have a validation background and/or a true passion for torturing new hardware/driver :-) We are singularly focused on the ZIL device market, so a test environment bound by synchronous writes is required. The beta program will provide extensive technical support and a unique opportunity to have direct interaction with the product designers. Would you like to take part in the advancement of Open Storage and explore the far-reaching potential of ZFS based Hybrid Storage Pools? If so, please send an inquiry to zfs at ddrdrive dot com. The drive for speed, Christopher George Founder/CTO www.ddrdrive.com *** Special thanks goes out to SUN employees Garrett D'Amore and James McPherson for their exemplary help and support. Well done! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
On Thu, Jan 14, 2010 at 5:22 PM, Richard Elling richard.ell...@gmail.comwrote: On Jan 14, 2010, at 11:02 AM, Christopher George wrote: That's kind of an overstatement. NVRAM backed by on-board LI-Ion batteries has been used in storage industry for years; Respectfully, I stand by my three points of Li-Ion batteries as they relate to enterprise class NVRAM: ignition risk, thermal wear-out, and proprietary design. As a prior post stated, there is a dearth of published failure statistics of Li-Ion based BBUs. I can easily point out a company that has shipped tens of thousands of such boards over last 10 years. No argument here, I would venture the risks for consumer based Li-Ion based products did not become apparent or commonly accepted until the user base grew several orders of magnitude greater than tens of thousands. For the record, I agree there is a marked convenience with an integrated high energy Li-Ion battery solution - but at what cost? I see nothing in the design that precludes a customer from using a Li-Ion battery, if they so desire. Perhaps the collective has forgotten that DC power is one of the simplest and most widespread interfaces around? :-) So it boils down to packaging. I personally dislike having batteries all over the place, and I've seen dozens of customers who never pay attention to the battery status on their systems. However, for future design considerations, an optional internal energy mount can keep the wolves at bay. -- richard Personally I'd say it's a must. Most DC's I operate in wouldn't tolerate having a card separately wired from the chassis power. It's far, far, far more likely to have a tech knock that power cord out and not have anyone notice than to have a battery spontaneously combust. My .02. -- --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Invalid zpool argument in Solaris 10 (10/09)
OK. Now I see the problem. ldi_get_size() uses a few DDI properties that are exported by the driver. Specifically it looks for NBlocks, nblocks, Size, and size. CMLB version 1, which is implemented in OpenSolaris, handles creating and returning these DDI properties. Solaris 10, however, only supports CMLB version 0, which apparently does NOT create these properties for you. So ldi_get_size() is failing in my Solaris 10 driver because it can't find any of the size properties. I'll implement these in my driver and hopefully that fixes the zpool create issue. Thanks for the help! Josh On 01/14/2010 04:14 PM, Mark J Musante wrote: On Thu, 14 Jan 2010, Josh Morris wrote: WARNING: ERROR (TMS: pflash.c pflash_strategy ln:996 ) Access past the end of device! (18014398509480448 = 880791552) WARNING: ERROR (TMS: pflash.c pflash_strategy ln:996 ) Access past the end of device! (18014398509480960 = 880791552) That's interesting. One of the things that zpool create does is set up the ZFS volume information. It writes two copies near the beginning of the disk and two copies at the end. The way it knows where the end of the disk is, is through the call to ldi_get_size(). Is there a mismatch between what the driver passes back for the size? The number 18014398509480960 is a bit suspicious as it is 0x3FFC00. Likewise, 18014398509480448 is 0x3FFA00. Also a CDROMREADOFFSET IOCTL was issued to the device. This seems weird since my driver returns DKC_MD for the ctype when issued DKIOCINFO. Yeah, that's strange too, but zfs never issues this ioctl. It goes through the ldi api to access devices, so that's a layer lower than I know about. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL to disk
On Jan 14, 2010, at 10:58 AM, Jeffry Molanus wrote: Hi all, Are there any recommendations regarding min IOPS the backing storage pool needs to have when flushing the SSD ZIL to the pool? Pedantically, as many as you can afford :-) The DDRdrive folks sell IOPS at 200 IOPS/$. Sometimes people get confused about the ZIL and separate logs. For sizing purposes, the ZIL is a write-only workload. Data which is written to the ZIL is later asynchronously written to the pool when the txg is committed. Consider a pool of 3x 2TB SATA disks in RAIZ1, you would roughly have 80 IOPS. Any info about the relation between ZIL pool performance? Or will the ZIL simply fill up and performance drops to pool speed? The ZFS write performance for this configuration should consistently be greater than 80 IOPS. We've seen measurements in the 600 write IOPS range. Why? Because ZFS writes tend to be contiguous. Also, with the SATA disk write cache enabled, bursts of writes are handled quite nicely. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL to disk
On Thu, Jan 14, 2010 at 03:41:17PM -0800, Richard Elling wrote: Consider a pool of 3x 2TB SATA disks in RAIZ1, you would roughly have 80 IOPS. Any info about the relation between ZIL pool performance? Or will the ZIL simply fill up and performance drops to pool speed? The ZFS write performance for this configuration should consistently be greater than 80 IOPS. We've seen measurements in the 600 write IOPS range. Why? Because ZFS writes tend to be contiguous. Also, with the SATA disk write cache enabled, bursts of writes are handled quite nicely. -- richard That's interesting. I was under the impression that your IOPS for a zpool were limited to the slowest drive in a vdev -- times the number of vdevs. Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL to disk
On Thu, Jan 14, 2010 at 03:55:20PM -0800, Ray Van Dolson wrote: On Thu, Jan 14, 2010 at 03:41:17PM -0800, Richard Elling wrote: Consider a pool of 3x 2TB SATA disks in RAIZ1, you would roughly have 80 IOPS. Any info about the relation between ZIL pool performance? Or will the ZIL simply fill up and performance drops to pool speed? The ZFS write performance for this configuration should consistently be greater than 80 IOPS. We've seen measurements in the 600 write IOPS range. Why? Because ZFS writes tend to be contiguous. Also, with the SATA disk write cache enabled, bursts of writes are handled quite nicely. -- richard That's interesting. I was under the impression that your IOPS for a zpool were limited to the slowest drive in a vdev -- times the number of vdevs. Qualification: For RAIDZ* ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL to disk
On Jan 14, 2010, at 3:59 PM, Ray Van Dolson wrote: On Thu, Jan 14, 2010 at 03:55:20PM -0800, Ray Van Dolson wrote: On Thu, Jan 14, 2010 at 03:41:17PM -0800, Richard Elling wrote: Consider a pool of 3x 2TB SATA disks in RAIZ1, you would roughly have 80 IOPS. Any info about the relation between ZIL pool performance? Or will the ZIL simply fill up and performance drops to pool speed? The ZFS write performance for this configuration should consistently be greater than 80 IOPS. We've seen measurements in the 600 write IOPS range. Why? Because ZFS writes tend to be contiguous. Also, with the SATA disk write cache enabled, bursts of writes are handled quite nicely. -- richard That's interesting. I was under the impression that your IOPS for a zpool were limited to the slowest drive in a vdev -- times the number of vdevs. Qualification: For RAIDZ* That is a simple performance model for small, random reads. The ZIL is a write-only workload, so the model will not apply. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL to disk
On Jan 14, 2010, at 4:02 PM, Richard Elling wrote: That is a simple performance model for small, random reads. The ZIL is a write-only workload, so the model will not apply. BTW, it is a Good Thing (tm) the small, random read model does not apply to the ZIL. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fast mirror resync?
On Thu, Jan 14, 2010 at 06:11:10PM -0500, Miles Nordin wrote: zpool offline / zpool online of a mirror component will indeed fast-resync, and I do it all the time. zpool detach / attach will not. Yes, but the offline device is still part of the pool. What are you doing with the device when you take it offline? (What's the reason you're offlining it?) -- Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
I see nothing in the design that precludes a customer from using a Li-Ion battery, if they so desire. Perhaps the collective has forgotten that DC power is one of the simplest and most widespread interfaces around? :-) Richard, Very good point! We have already had a request for the DC jack to be unpopulated so that an internal power source could be utilized. We will make this modification available to any customer who asks. Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] opensolaris-vmware
On Fri, Jan 15, 2010 at 12:33 AM, Gregory Durham gregory.dur...@gmail.com wrote: I have been recommended by several other users on this mailing list to use inside the vm snapshots, vmware snapshots, and then use zfs snapshots. I believe I understand the difference between filesystem snapshots vs block level snapshots, however since I cannot use vmware snapshots (all LUNs on the SAN are mapped to ESXi using RAW disk in physical compatibility mode, which then disables vmware snapshots) does this cause me to have a weaker backup strategy? What else can I do? Should I convert the virtual machines from physical compatibility to virtual compatibility in order to get snapshotting on the ESXi server? IMHO using all three is too much. you can pick one, and combine that with other (non-snapshot) backup strategy. vmware snapshot is good because it also stores memory state, but it also uses more space. What I recommend you to do in your current setup: - check whether your application can survive an unclean shutdown/power outage (it should). If not, then you have to do application-specific backup. - do zfs snapshot plus send/receive - add regular tape backup if necessary, although it might not need to be as frequent (you already plan this) - regulary excercise restoring from backups, to make sure your backup system works. -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!
Personally I'd say it's a must. Most DC's I operate in wouldn't tolerate having a card separately wired from the chassis power. May I ask the list, if this is a hard requirement for anyone else? Please email me directly cgeorge at ddrdrive dot com. Thank you, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS ARC Cache and Solaris u8
Has the ARC cache in Solaris 10 u8 been improved? Have been reading some mixed messages. Also should this parameter be tuned? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ARC Cache and Solaris u8
On Thu, 14 Jan 2010, Mr. T Doodle wrote: Has the ARC cache in Solaris 10 u8 been improved? Have been reading some mixed messages. Also should this parameter be tuned? It is true that there does not seem to be a kernel patch out for the bug I reported, but the ARC cache in Solaris 10 U8 seems to work quite well. What have you heard is wrong with it? Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool fragmentation issues? (dovecot)
The best Mail Box to use under Dovecot for ZFS is MailDir, each email is store as a individual file. Can not agree on that. dbox is about 10x faster - at least if you have 1 messages in one mailbox folder. Thats not because of ZFS but dovecot just handles dbox files (one for each message like maildir) better in terms of indexing. The CPU stats for importing 10 messages via imap copy are even more worse for maildir: dbox is about 100x more efficient... . But anyway: its no problem to test different with imaptest or offlineimap because each users mailbox (and even folders) could be stored in a different format... Just to clarify: I'm using dovecot 1.2.x -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool fragmentation issues? (dovecot) [SEC=UNCLASSIFIED]
0n Thu, Jan 14, 2010 at 08:43:06PM -0800, Michael Keller wrote: The best Mail Box to use under Dovecot for ZFS is MailDir, each email is store as a individual file. Can not agree on that. dbox is about 10x faster - at least if you have 1 messages in one mailbox folder. Thats not because of ZFS but dovecot just handles dbox files (one for each message like maildir) better in terms of indexing. Got a link to this magic dbox format ? -Alex IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss