Re: [zfs-discuss] 'zfs recv' is very slow
On Mon, Feb 2, 2009 at 6:55 AM, Robert Milkowski mi...@task.gda.pl wrote: It definitely does. I made some tests today comparing b101 with b105 while doing 'zfs send -R -I A B /dev/null' with several dozen snapshots between A and B. Well, b105 is almost 5x faster in my case - that's pretty good. -- Robert Milkowski http://milek.blogspot.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Sad to report that I am seeing the slow zfs recv issue cropping up again while running b105 :( Not sure what has triggered the change, but I am seeing the same behavior again: massive amounts of reads on the receiving side, while only receiving just tiny bursts of data amounting to a mere megabyte a second. It doesn't seem to happen every single time though which is odd, but I can provoke it by destroying a snapshot from the pool I am sending, then taking another snapshot and re-sending it. It seems to cause the receiving side to go into this read storm before any data is transferred. I'm going to open a case in the morning, and see if I can't get an engineer to look at this. -- Brent Jones br...@servuhome.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
hi I have a AOC-USAS-L8i working in both a Gigabyte GA-P35-DS3P and Gigabyte GA-EG45M-DS2H under OpenSolaris build 104+ (Nexenta Core 2.0 beta). the controller looks like this in lspci: 01:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08) Subsystem: Super Micro Computer Inc Unknown device a380 Flags: bus master, fast devsel, latency 0, IRQ 15 I/O ports at a000 Memory at f101 (64-bit, non-prefetchable) Memory at f100 (64-bit, non-prefetchable) Capabilities: [50] Power Management version 2 Capabilities: [68] Express Endpoint IRQ 0 Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+ Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1 with 8x 300GB (older 7200rpm disks from 2004) as raid-z1 on a 3.2Ghz Core Duo and 4GB RAM, this delivers following figures: 1. write performance (linear) ~98MBytes/s: r...@marvin:/tank/storage# dd_rescue -b4M /dev/zero test Summary for /dev/zero - test:0, errxfer: 0.0k, succxfer: 10420224.0k dd_rescue: (info): ipos: 10485760.0k, opos: 10485760.0k, xferd: 10485760.0k errs: 0, errxfer: 0.0k, succxfer: 10485760.0k +curr.rate: 106039kB/s, avg.rate: 104173kB/s, avg.load: 16.4% r...@marvin:/tank/storage# dd_rescue -b4M /dev/zero test2 Summary for /dev/zero - test2:, errxfer: 0.0k, succxfer: 4128768.0k dd_rescue: (info): ipos: 4194304.0k, opos: 4194304.0k, xferd: 4194304.0k errs: 0, errxfer: 0.0k, succxfer: 4194304.0k +curr.rate: 88486kB/s, avg.rate: 96142kB/s, avg.load: 14.1% 2. read performance (linear) ~290MBytes/s: Summary for test - /dev/null: dd_rescue: (info): ipos: 10485760.0k, opos: 10485760.0k, xferd: 10485760.0k errs: 0, errxfer: 0.0k, succxfer: 10485760.0k +curr.rate: 0kB/s, avg.rate: 285824kB/s, avg.load: 40.4% Summary for test2 - /dev/null: dd_rescue: (info): ipos: 4194304.0k, opos: 4194304.0k, xferd: 4194304.0k errs: 0, errxfer: 0.0k, succxfer: 4194304.0k +curr.rate: 0kB/s, avg.rate: 308484kB/s, avg.load: 39.8% regards nicola -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Strange performance loss
I'm moving some data off an old machine to something reasonably new. Normally, the new machine performs better, but I have one case just now where the new system is terribly slow. Old machine - V880 (Solaris 8) with SVM raid-5: # ptime du -kds foo 15043722foo real6.955 user0.964 sys 5.492 And now the new machine - T5140 (latest Solaris 10) with ZFS striped atop a bunch of 2530 arrays: # ptime du -kds foo 15343120foo real 2:55.210 user2.559 sys 2:05.788 It's not just du; a find on that directory is similarly bad. I have other filesystems of similar size and number of files (there are only about 200K files) that perform well, so there must be something about this filesystem that is throwing zfs into a spin. Anybody else seen anything like this? I'm suspicious of ACL handling. So for a quick test I took one directory with approx 5000 files in it and timed du (I'm running all this as root, btw): 1. Just the files, no ACLs. real0.238 user0.050 sys 0.187 2. Files with ACLs: real0.467 user0.055 sys 0.411 3. Files with ACLs, and an ACL on the directory real0.610 user0.058 sys 0.551 I don't know whether that explains all the problem, but it's clear that having ACLs on files and directories has a definite cost. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
I am wondering if the usb storage device is not reliable for ZFS usage, can the situation be improved if I put the intent log on internal sata disk to avoid corruption and utilize the convenience of usb storage at the same time? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
huh? but that looses the convenience of USB. I've used USB drives without problems at all, just remember to zpool export them before you unplug. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Two zvol devices one volume?
I have seen this 'phantom dataset' with a pool on nv93. I created a zpool, created a dataset, then destroyed the zpool. When creating a new zpool on the same partitions/disks as the destroyed zpool, upon export I receive the same message as you describe above, even though I never created the dataset in the new pool. Creating a dataset of the same name and then destroying it doesn't seem to get rid of it, either. The solution for your case be on this post, if not file a bug: http://www.opensolaris.org/jive/thread.jspa?messageID=311573#311573 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
While mobility could be lost, usb storage still has the advantage of being cheap and easy to install comparing to install internal disks on pc, so if I just want to use it to provide zfs storage space for home file server, can a small intent log located on internal sata disk prevent the pool corruption caused by a power cut? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SPAM *** Re: unformatted partition
Hello, thanks for the answer. The partition table shows that Wind and OS run on: 1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0 Partition StatusType Start End Length% = == = === == === 1 IFS: NTFS 0 50985099 26 2 ActiveSolaris2 5099 129317833 40 The disk 0. c7t0d0 doesn't contain any disk type: AVAILABLE DISK SELECTIONS: 0. c7t0d0 drive type unknown /p...@0,0/pci104d,8...@1d,7/stor...@4/d...@0,0 1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0 Specify disk (enter its number): 0 Error occurred with device in usechecking: Bad file number Error: can't open disk '/dev/rdsk/c7t0d0p0'. AVAILABLE DRIVE TYPES: 0. Auto configure 1. other Specify disk type (enter its number): 0 Auto configuration via format.dat[no]? y Auto configure failed No Solaris fdisk partition found. If create some file system using the Gparted, my partition table will look like this: Cylinders Partition StatusType Start End Length% = == = === == === 1 IFS: NTFS 0 50985099 26 2 ActiveSolaris2 5099 129317833 40 3 Solaris xyz xyz 34 but I still don't know how to import this partition (num. 3) If I run: zpool create c9d0 I'll lost all my data, right? Regards, Jan Hlodan Will Murnane wrote: On Thu, Feb 12, 2009 at 21:59, Jan Hlodan jh231...@mail-emea.sun.com wrote: I would like to import 3. partition as a another pool but I can't see this partition. sh-3.2# format -e Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7t0d0 drive type unknown /p...@0,0/pci104d,8...@1d,7/stor...@4/d...@0,0 1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0 I guess that 0. is wind partition and 1. is Opensolaris What you see there are whole disks, not partitions. Try zpool status, which will show you that rpool is on something like c9d0s0. Then go into format again, pick 1 (in my example), type fdisk to look at the DOS-style partition table and verify that the partitioning of the disk matches what you thought it was. Then you can create a new zpool with something like zpool create data c9t0p3. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 2/13/2009 5:58 AM, Ross wrote: huh? but that looses the convenience of USB. I've used USB drives without problems at all, just remember to zpool export them before you unplug. I think there is a subcommand of cfgaadm you should run to to notify Solariss that you intend to unplug the device. I don't use USB, and my familiarity with cfgadm (for FC and SCSI) is limited. -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS vdev_cache
Hi All, How would i obtain the current setting for the vdev_cache from a production system? We are looking at trying to tune ZFS for better performance with respect to oracle databases, however before we start changing settings via the /etc/system file we would like to confirm the setting from the running OS. Thanks Tony ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS vdev_cache
On Fri, 13 Feb 2009, Tony Marshall wrote: How would i obtain the current setting for the vdev_cache from a production system? We are looking at trying to tune ZFS for better performance with respect to oracle databases, however before we start changing settings via the /etc/system file we would like to confirm the setting from the running OS. The kernel variable zfs_vdev_cache_size indicates the size of the cash per leaf vdev. By default, it's set to 0xa0 or 10MB. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Having a separate intent log on good hardware will not prevent corruption on a pool with bad hardware. By good I mean hardware that correctly flush their write caches when requested. Note, a pool is always consistent (again when using good hardware). The function of the intent log is not to provide consistency (like a journal), but to speed up synchronous requests like fsync and O_DSYNC. Neil. On 02/13/09 06:29, Jiawei Zhao wrote: While mobility could be lost, usb storage still has the advantage of being cheap and easy to install comparing to install internal disks on pc, so if I just want to use it to provide zfs storage space for home file server, can a small intent log located on internal sata disk prevent the pool corruption caused by a power cut? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13 at 9:14, Neil Perrin wrote: Having a separate intent log on good hardware will not prevent corruption on a pool with bad hardware. By good I mean hardware that correctly flush their write caches when requested. Can someone please name a specific piece of bad hardware? --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, Feb 12 at 19:43, Toby Thain wrote: ^^ Spec compliance is what we're testing for... We wouldn't know if this special variant is working correctly either. :) Time the difference between NCQ reads with and without FUA in the presence of overlapped cached write data. That should have a significant performance penalty, compared to a device servicing the reads from a volatile buffer cache. FYI, there are semi-commonly-available power control units that take serial port or USB as an input, and have a whole bunch of SATA power connectors on them. These are the sorts of things that drive vendors use to bounce power unexpectedly in their testing, if you need to perform that same validation, it makes sense to invest in that bit of infrastructure. Something like this: http://www.ulinktech.com/products/hw_power_hub.html or just roll your own in a few days like this guy did for his printer: http://chezphil.org/slugpower/ It should be pretty trivial to perform a few thousand cached writes, issue a flush cache ext, and turn off power immediately after that command completes. Then go back and figure out how many of those writes were successfully written as the device claimed. -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
gm == Gary Mills mi...@cc.umanitoba.ca writes: gm That implies that ZFS will have to detect removable devices gm and treat them differently than fixed devices. please, no more of this garbage, no more hidden unchangeable automatic condescending behavior. The whole format vs rmformat mess is just ridiculous. And software and hardware developers alike have both proven themselves incapable of settling on a definition of ``removeable'' that fits with actual use-cases like: FC/iSCSI; hot-swappable SATA; adapters that have removeable sockets on both ends like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so on. As we've said many times, if the devices are working properly, then they can be unplugged uncleanly without corrupting the pool, and without corrupting any other non-Microsoft filesystem. This is an old, SOLVED, problem. It's ridiculous hypocricy to make whole filesystems DSYNC, to even _invent the possibility for the filesystem to be DSYNC_, just because it is possible to remove something. Will you do the same thing because it is possible for your laptop's battery to run out? just, STOP! If the devices are broken, the problem is that they're broken, not that they're removeable. personally, I think everything with a broken write cache should be black-listed in the kernel and attach read-only by default, whether it's a USB bridge or a SATA disk. This will not be perfect because USB bridges, RAID layers and iSCSI targets, will often hide the identity of the SATA drive behind them, and of course people will demand a way to disable it. but if you want to be ``safe'', then for the sake of making the point, THIS is the right way to do it, not muck around with these overloaded notions of ``removeable''. Also, the so-far unacknowledged ``iSCSI/FC Write Hole'' should be fixed so that a copy of all written data is held in the initiator's buffer cache until it's verified as *on the physical platter/NVRAM* so that it can be replayed if necessary, and SYNC CACHE commands are allowed to fail far enough that even *things which USE the initiator, like ZFS* will understand what it means when SYNC CACHE fails, and bounced connections are handled correctly---otherwise, when connections bounce or SYNC CACHE returns failure, correctness requires that the initiator pretend like its plug was pulled and panic. Short of that the initiator system must forcibly unmount all filesystems using that device and kill all processes that had files open on those filesystems. And sysadmins should have and know how to cleverly use a tool that tests for both functioning barriers and working SYNC CACHE, end-to-end. NO more ``removeable'' attributes, please! You are just pretending to solve a much bigger problem, and making things clumsy and disgusting in the process. pgpoCtG5UI9HX.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
fc == Frank Cusack fcus...@fcusack.com writes: Dropping a flush-cache command is just as bad as dropping a write. fc Not that it matters, but it seems obvious that this is wrong fc or anyway an exaggeration. Dropping a flush-cache just means fc that you have to wait until the device is quiesced before the fc data is consistent. fc Dropping a write is much much worse. backwards i think. Dropping a flush-cache is WORSE than dropping the flush-cache plus all writes after the flush-cache. The problem that causes loss of whole pools rather than loss of recently-written data isn't that you're writing too little. It's that you're dropping the barrier and misordering the writes. consequently you lose *everything you've ever written,* which is much worse than losing some recent writes, even a lot of them. pgp0bxNk2dBD0.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
t == Tim t...@tcsac.net writes: t I would like to believe it has more to do with Solaris's t support of USB than ZFS, but the fact remains it's a pretty t glaring deficiency in 2009, no matter which part of the stack t is at fault. maybe, but for this job I don't much mind glaring deficiencies, as long as it's possible to assemble a working system without resorting to trial-and-error, and possible to know it's working before loading data on it. Right now, by following the ``best practices'', you don't know what to buy, and after you receive the hardware you don't know if it works until you lose a pool, at which time someone will tell you ``i guess it wasn't ever working.'' Even if you order sun4v or an expensive FC disk shelf, you still don't know if it works. (though, I'm starting to suspect, ni the case of FC or iSCSI the answer is always ``it does not work'') The only thing you know for sure is, if you lose a pool, someone will blame it on hardware bugs surroudning cache flushes, or else try to conflate the issue with a bunch of inapplicable garbage about checksums and wire corruption. This is unworkable. I'm not saying glaring 2009 deficiencies are irrelevant---on my laptop I do mind because I got out of a multi-year abusive relationship with NetBSD/hpcmips, and now want all parts of my laptop to have drivers. And I guess it applies to that neat timeslider / home-base--USB-disk case we were talking about a month ago. but for what I'm doing I will actually accept the advice ``do not ever put ZFS on USB because ZFS is a canary in the mine of USB bugs''---it's just, that advice is not really good enough to settle the whole issue. pgpFtPv2xfqGk.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Miles Nordin wrote: gm That implies that ZFS will have to detect removable devices gm and treat them differently than fixed devices. please, no more of this garbage, no more hidden unchangeable automatic condescending behavior. The whole format vs rmformat mess is just ridiculous. And software and hardware developers alike have both proven themselves incapable of settling on a definition of ``removeable'' that fits with actual use-cases like: FC/iSCSI; hot-swappable SATA; adapters that have removeable sockets on both ends like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so on. Since this discussion is taking place in the context of someone removing a USB stick I think you're confusing the issue by dragging in other technologies. Let's keep this in the context of the posts preceding it which is how USB devices are treated. I would argue that one of the first design goals in an environment where you can expect people who are not computer professionals to be interfacing with computers is to make sure that the appropriate safeties are in place and that the system does not behave in a manner which a reasonable person might find unexpected. This is common practice for any sort of professional engineering effort. As an example, you aren't going to go out there and find yourself a chainsaw being sold new without a guard. It might be removable, but the default is to include it. Why? Well because there is a considerable chance of damage to the user without it. Likewise with a file system on a device which might cache a data write for as long as thirty seconds while being easily removable. In this case, the user may write the file and seconds later remove the device. Many folks out there behave in this manner. It really doesn't matter to them that they have a copy of the last save they did two hours ago, what they want and expect is that the most recent data they saved actually be on the USB stick for the to retrieve. What you are suggesting is that it is better to lose that data when it could have been avoided. I would personally suggest that it is better to have default behavior which is not surprising along with more advanced behavior for those who have bothered to read the manual. In Windows case, the write cache can be turned on, it is not unchangeable and those who have educated themselves use it. I seldom turn it on unless I'm doing heavy I/O to a USB hard drive, otherwise the performance difference is just not that great. Regards, Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 12:20:21 PM -0500 Miles Nordin car...@ivy.net wrote: fc == Frank Cusack fcus...@fcusack.com writes: Dropping a flush-cache command is just as bad as dropping a write. fc Not that it matters, but it seems obvious that this is wrong fc or anyway an exaggeration. Dropping a flush-cache just means fc that you have to wait until the device is quiesced before the fc data is consistent. fc Dropping a write is much much worse. backwards i think. Dropping a flush-cache is WORSE than dropping the flush-cache plus all writes after the flush-cache. The problem that causes loss of whole pools rather than loss of recently-written data isn't that you're writing too little. It's that you're dropping the barrier and misordering the writes. consequently you lose *everything you've ever written,* which is much worse than losing some recent writes, even a lot of them. Who said dropping a flush-cache means dropping any subsequent writes, or misordering writes? If you're misordering writes isn't that a completely different problem? Even then, I don't see how it's worse than DROPPING a write. The data eventually gets to disk, and at that point in time, the disk is consistent. When dropping a write, the data never makes it to disk, ever. In the face of a power loss, of course these result in the same problem, but even without a power loss the drop of a write is catastrophic. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 12:10:08 PM -0500 Miles Nordin car...@ivy.net wrote: please, no more of this garbage, no more hidden unchangeable automatic condescending behavior. The whole format vs rmformat mess is just ridiculous. thank you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 12:41:12 PM -0500 Miles Nordin car...@ivy.net wrote: fc == Frank Cusack fcus...@fcusack.com writes: fc if you have 100TB of data, wouldn't you have a completely fc redundant storage network If you work for a ponderous leaf-eating brontosorous maybe. If your company is modern I think having such an oddly large amount of data in one pool means you'd more likely have 70 whitebox peecees using motherboard ethernet/sata only, connected to a mesh of unmanaged L2 switches (of some peculiar brand that happens to work well.) There will always be one or two peecees switched off, and constantly something will be resilvering. The home user case is not really just for home users. I think a lot of people are tired of paying quadruple for stuff that still breaks, even serious people. oh i dunno. i recently worked for a company that practically defines modern and we had multiples of 100TB of data. Like you said, not all in one place, but any given piece was fully redundant (well, if you count RAID-5 as fully ... but I'm really referring to the infrastructure). I can't imagine it any other way ... the cost of not having redundancy in the face of a failure is so much higher compared to the cost of building in that redundancy. Also I'm not sure how you get 1 pool with more than 1 peecee as zfs is not a cluster fs. So what you are talking about is multiple pools, and in that case if you do lose one (not redundant for whatever reason) you only have to restore a fraction of the 100TB from backup. fc Isn't this easily worked around by having UPS power in fc addition to whatever the data center supplies? In NYC over the last five years the power has been more reliable going into my UPS than coming out of it. The main reason for having a UPS is wiring maintenance. And the most important part of the UPS is the externally-mounted bypass switch because the UPS also needs maintenance. UPS has never _solved_ anything, it always just helps. so in the end we have to count on the software's graceful behavior, not on absolutes. I can't say I agree about the UPS, however I've already been pretty forthright that UPS, etc. isn't the answer to the problem, just a mitigating factor to the root problem. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009 17:53:00 +0100, Eric D. Mudama edmud...@bounceswoosh.org wrote: On Fri, Feb 13 at 9:14, Neil Perrin wrote: Having a separate intent log on good hardware will not prevent corruption on a pool with bad hardware. By good I mean hardware that correctly flush their write caches when requested. Can someone please name a specific piece of bad hardware? Or better still, name a few -GOOD- ones. -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D + http://nagual.nl/ | SunOS sxce snv107++ + All that's really worth doing is what we do for others (Lewis Carrol) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
fc == Frank Cusack fcus...@fcusack.com writes: fc If you're misordering writes fc isn't that a completely different problem? no. ignoring the flush cache command causes writes to be misordered. fc Even then, I don't see how it's worse than DROPPING a write. fc The data eventually gets to disk, and at that point in time, fc the disk is consistent. When dropping a write, the data never fc makes it to disk, ever. If you drop the flush cache command and every write after the flush cache command, yeah yeah it's bad, but in THAT case, the disk is still always consistent because no writes have been misordered. fc In the face of a power loss, of course these result in the fc same problem, no, it's completely different in a power loss, which is exactly the point. If you pull the cord while the disk is inconsistent, you may lose the entire pool. If the disk is never inconsistent because you've never misordered writes, you will only lose recent write activity. Losing everything you've ever written is usually much worse than losing what you've written recently. yeah yeah some devil's advocate will toss in, ``i *need* some consistency promises or else it's better that the pool its hand and say `broken, restore backup please' even if the hand-raising comes in the form of losing the entire pool,'' well in that case neither one is acceptable. But if your requirements are looser, then dropping a flush cache command plus every write after the flush cache command is much better than just ignoring the flush cache command. of course, that is a weird kind of failure that never happens. I described it just to make a point, to argue against this overly-simple idea ``every write is precious. let's do them as soon as possible because there could be Valuable Business Data inside the writes! we don't want to lose anything Valuable!'' The part of SYNC CACHE that's causing people to lose entire pools isn't the ``hurry up! write faster!'' part of the command, such that without it you still get your precious writes, just a little slower. NO. It's the ``control the order of writes'' part that's important for integrity on a single-device vdev. pgpzrY74grvli.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 1:10:55 PM -0500 Miles Nordin car...@ivy.net wrote: fc == Frank Cusack fcus...@fcusack.com writes: fc If you're misordering writes fc isn't that a completely different problem? no. ignoring the flush cache command causes writes to be misordered. oh. can you supply a reference or if you have the time, some more explanation? (or can someone else confirm this.) my understanding (weak, admittedly) is that drives will reorder writes on their own, and this is generally considered normal behavior. so to guarantee consistency *in the face of some kind of failure like a power loss*, we have write barriers. flush-cache is a stronger kind of write barrier. now that i think more, i suppose yes if you ignore the flush cache, then writes before and after the flush cache could be misordered, however it's the same as if there were no flush cache at all, and again as long as the drive has power and you can quiesce it then the data makes it to disk, and all is consistent and well. yes? whereas if you drop a write, well it's gone off into a black hole. fc Even then, I don't see how it's worse than DROPPING a write. fc The data eventually gets to disk, and at that point in time, fc the disk is consistent. When dropping a write, the data never fc makes it to disk, ever. If you drop the flush cache command and every write after the flush cache command, yeah yeah it's bad, but in THAT case, the disk is still always consistent because no writes have been misordered. why would dropping a flush cache imply dropping every write after the flush cache? fc In the face of a power loss, of course these result in the fc same problem, no, it's completely different in a power loss, which is exactly the point. If you pull the cord while the disk is inconsistent, you may lose the entire pool. If the disk is never inconsistent because you've never misordered writes, you will only lose recent write activity. Losing everything you've ever written is usually much worse than losing what you've written recently. yeah, as soon as i wrote that i realized my error, so thank you and i agree on that point. *in the event of a power loss* being inconsistent is a worse problem. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 10:29:05 AM -0800 Frank Cusack fcus...@fcusack.com wrote: On February 13, 2009 1:10:55 PM -0500 Miles Nordin car...@ivy.net wrote: fc == Frank Cusack fcus...@fcusack.com writes: fc If you're misordering writes fc isn't that a completely different problem? no. ignoring the flush cache command causes writes to be misordered. oh. can you supply a reference or if you have the time, some more explanation? (or can someone else confirm this.) uhh ... that question can be ignored as i answered it myself below. sorry if i'm must being noisy now. my understanding (weak, admittedly) is that drives will reorder writes on their own, and this is generally considered normal behavior. so to guarantee consistency *in the face of some kind of failure like a power loss*, we have write barriers. flush-cache is a stronger kind of write barrier. now that i think more, i suppose yes if you ignore the flush cache, then writes before and after the flush cache could be misordered, however it's the same as if there were no flush cache at all, and again as long as the drive has power and you can quiesce it then the data makes it to disk, and all is consistent and well. yes? -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
fc == Frank Cusack fcus...@fcusack.com writes: fc why would dropping a flush cache imply dropping every write fc after the flush cache? it wouldn't and probably never does. It was an imaginary scenario invented to argue with you and to agree with the guy in the USB bug who said ``dropping a cache flush command is as bad as dropping a write.'' fc oh. can you supply a reference or if you have the time, some fc more explanation? (or can someone else confirm this.) I posted something long a few days ago that I need to revisit. The problem is, I don't actually understand how the disk commands work, so I was talking out my ass. Although I kept saying, ``I'm not sure it actually works this way,'' my saying so doesn't help anyone who spends the time to read it and then gets a bunch of mistaken garbage stuck in his head, which people who actually recognize as garbage are too busy to correct. It'd be better for everyone if I didn't do that. On the other hand, I think there's some worth to dreaming up several possibilities of what I fantisize the various commands might mean or do, rather than simply reading one of the specs to get the one right answer, because from what people in here say it soudns as though implementors of actual systems based on the SCSI commandset live in this same imaginary world of fantastic and multiple realities without any meaningful review or accountability that I do. (disks, bridges, iSCSI targets and initiators, VMWare/VBox storage, ...) pgpkzKNL1NfqX.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Superb news, thanks Jeff. Having that will really raise ZFS up a notch, and align it much better with peoples expectations. I assume it'll work via zpool import, and let the user know what's gone wrong? If you think back to this case, imagine how different the users response would have been if instead of being unable to mount the pool, ZFS had turned around and said: This pool was not unmounted cleanly, and data has been lost. Do you want to restore your pool to the last viable state: (timestamp goes here)? Something like that will have people praising ZFS' ability to safeguard their data, and the way it recovers even after system crashes or when hardware has gone wrong. You could even have a common causes of this are... message, or a link to an online help article if you wanted people to be really impressed. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Ross wrote: Something like that will have people praising ZFS' ability to safeguard their data, and the way it recovers even after system crashes or when hardware has gone wrong. You could even have a common causes of this are... message, or a link to an online help article if you wanted people to be really impressed. I see a career in politics for you. Barring an operating system implementation bug, the type of problem you are talking about is due to improperly working hardware. Irreversibly reverting to a previous checkpoint may or may not obtain the correct data. Perhaps it will produce a bunch of checksum errors. There are already people praising ZFS' ability to safeguard their data, and the way it recovers even after system crashes or when hardware has gone wrong. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
On Fri, Feb 13, 2009 at 04:51, Nicola Fankhauser nicola.fankhau...@variant.ch wrote: hi I have a AOC-USAS-L8i working in both a Gigabyte GA-P35-DS3P and Gigabyte GA-EG45M-DS2H under OpenSolaris build 104+ (Nexenta Core 2.0 beta). Very cool! It's good to see people having success with this card. How does mounting the card work? Can one reverse the slot cover and screw it in like that, or is the card hanging free? Can you provide pictures of the card mounted in the case? Thanks! Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 10:29:05AM -0800, Frank Cusack wrote: On February 13, 2009 1:10:55 PM -0500 Miles Nordin car...@ivy.net wrote: fc == Frank Cusack fcus...@fcusack.com writes: fc If you're misordering writes fc isn't that a completely different problem? no. ignoring the flush cache command causes writes to be misordered. oh. can you supply a reference or if you have the time, some more explanation? (or can someone else confirm this.) Ordering matters for atomic operations, and filesystems are full of those. Now, if ordering is broken but the writes all eventually hit the disk then no one will notice. But if power failures and/or partitions (cables get pulled, network partitions occur affecting an iSCSI connection, ...) then bad things happen. For ZFS the easiest way to ameliorate this is the txg fallback fix that Jeff Bonwick has said is now a priority. And if ZFS guarantees no block re-use until N txgs pass after a block is freed, then the fallback can be of up to N txgs, which gives you a decent chance that you'll recover your pool in the face of buggy devices, but for each discarded txg you lose that transaction's writes, you lose data incrementally. (The larger N is the better your chance that the oldest of the last N txg's writes will all hit the disk in spite of the disk's lousy cache behaviors.) The next question is how to do the fallback, UI-wise. Should it ever be automatic? A pool option for that would be nice (I'd use it on all-USB pools). If/when not automatic, how should the user/admin be informed of the failure to open the pool and the option to fallback on an older txg (with data loss)? (For non-removable pools imported at boot time the answer is that the service will fail, causing sulogin to be invoked so you can fix the problem on console. For removable pools there should be a GUI.) Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 7:41 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Fri, 13 Feb 2009, Ross wrote: Something like that will have people praising ZFS' ability to safeguard their data, and the way it recovers even after system crashes or when hardware has gone wrong. You could even have a common causes of this are... message, or a link to an online help article if you wanted people to be really impressed. I see a career in politics for you. Barring an operating system implementation bug, the type of problem you are talking about is due to improperly working hardware. Irreversibly reverting to a previous checkpoint may or may not obtain the correct data. Perhaps it will produce a bunch of checksum errors. Yes, the root cause is improperly working hardware (or an OS bug like 6424510), but with ZFS being a copy on write system, when errors occur with a recent write, for the vast majority of the pools out there you still have huge amounts of data that is still perfectly valid and should be accessible. Unless I'm misunderstanding something, reverting to a previous checkpoint gets you back to a state where ZFS knows it's good (or at least where ZFS can verify whether it's good or not). You have to consider that even with improperly working hardware, ZFS has been checksumming data, so if that hardware has been working for any length of time, you *know* that the data on it is good. Yes, if you have databases or files there that were mid-write, they will almost certainly be corrupted. But at least your filesystem is back, and it's in as good a state as it's going to be given that in order for your pool to be in this position, your hardware went wrong mid-write. And as an added bonus, if you're using ZFS snapshots, now your pool is accessible, you have a bunch of backups available so you can probably roll corrupted files back to working versions. For me, that is about as good as you can get in terms of handling a sudden hardware failure. Everything that is known to be saved to disk is there, you can verify (with absolute certainty) whether data is ok or not, and you have backup copies of damaged files. In the old days you'd need to be reverting to tape backups for both of these, with potentially hours of downtime before you even know where you are. Achieving that in a few seconds (or minutes) is a massive step forwards. There are already people praising ZFS' ability to safeguard their data, and the way it recovers even after system crashes or when hardware has gone wrong. Yes there are, but the majority of these are praising the ability of ZFS checksums to detect bad data, and to repair it when you have redundancy in your pool. I've not seen that many cases of people praising ZFS' recovery ability - uberblock problems seem to have a nasty habit of leaving you with tons of good, checksummed data on a pool that you can't get to, and while many hardware problems are dealt with, others can hang your entire pool. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Bob Friesenhahn wrote: On Fri, 13 Feb 2009, Ross wrote: Something like that will have people praising ZFS' ability to safeguard their data, and the way it recovers even after system crashes or when hardware has gone wrong. You could even have a common causes of this are... message, or a link to an online help article if you wanted people to be really impressed. I see a career in politics for you. Barring an operating system implementation bug, the type of problem you are talking about is due to improperly working hardware. Irreversibly reverting to a previous checkpoint may or may not obtain the correct data. Perhaps it will produce a bunch of checksum errors. Actually that's a lot like FMA replies when it sees a problem, telling the person what happened and pointing them to a web page which can be updated with the newest information on the problem. That's a good spot for This pool was not unmounted cleanly due to a hardware fault and data has been lost. The name of timestamp line contains the date which can be recovered to. Use the command # zfs reframbulocate this that -t timestamp to revert to timestamp --dave -- David Collier-Brown| Always do right. This will gratify Sun Microsystems, Toronto | some people and astonish the rest dav...@sun.com | -- Mark Twain cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Ross Smith wrote: You have to consider that even with improperly working hardware, ZFS has been checksumming data, so if that hardware has been working for any length of time, you *know* that the data on it is good. You only know this if the data has previously been read. Assume that the device temporarily stops pysically writing, but otherwise responds normally to ZFS. Then the device starts writing again (including a recent uberblock), but with a large gap in the writes. Then the system loses power, or crashes. What happens then? Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Fri, 13 Feb 2009, Ross Smith wrote: You have to consider that even with improperly working hardware, ZFS has been checksumming data, so if that hardware has been working for any length of time, you *know* that the data on it is good. You only know this if the data has previously been read. Assume that the device temporarily stops pysically writing, but otherwise responds normally to ZFS. Then the device starts writing again (including a recent uberblock), but with a large gap in the writes. Then the system loses power, or crashes. What happens then? Well in that case you're screwed, but if ZFS is known to handle even corrupted pools automatically, when that happens the immediate response on the forums is going to be something really bad has happened to your hardware, followed by troubleshooting to find out what. Instead of the response now, where we all know there's every chance the data is ok, and just can't be gotten to without zdb. Also, that's a pretty extreme situation since you'd need a device that is being written to but not read from to fail in this exact way. It also needs to have no scrubbing being run, so the problem has remained undetected. However, even in that situation, if we assume that it happened and that these recovery tools are available, ZFS will either report that your pool is seriously corrupted, indicating a major hardware problem (and ZFS can now state this with some confidence), or ZFS will be able to open a previous uberblock, mount your pool and begin a scrub, at which point all your missing writes will be found too and reported. And then you can go back to your snapshots. :-D Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Greg Palmer wrote: Miles Nordin wrote: gm That implies that ZFS will have to detect removable devices gm and treat them differently than fixed devices. please, no more of this garbage, no more hidden unchangeable automatic condescending behavior. The whole format vs rmformat mess is just ridiculous. And software and hardware developers alike have both proven themselves incapable of settling on a definition of ``removeable'' that fits with actual use-cases like: FC/iSCSI; hot-swappable SATA; adapters that have removeable sockets on both ends like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so on. Since this discussion is taking place in the context of someone removing a USB stick I think you're confusing the issue by dragging in other technologies. Let's keep this in the context of the posts preceding it which is how USB devices are treated. I would argue that one of the first design goals in an environment where you can expect people who are not computer professionals to be interfacing with computers is to make sure that the appropriate safeties are in place and that the system does not behave in a manner which a reasonable person might find unexpected. It has been my experience that USB sticks use FAT, which is an ancient file system which contains few of the features you expect from modern file systems. As such, it really doesn't do any write caching. Hence, it seems to work ok for casual users. I note that neither NTFS, ZFS, reiserfs, nor many of the other, high performance file systems are used by default for USB devices. Could it be that anyone not using FAT for USB devices is straining against architectural limits? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Fri, 13 Feb 2009, Ross Smith wrote: You have to consider that even with improperly working hardware, ZFS has been checksumming data, so if that hardware has been working for any length of time, you *know* that the data on it is good. You only know this if the data has previously been read. Assume that the device temporarily stops pysically writing, but otherwise responds normally to ZFS. Then the device starts writing again (including a recent uberblock), but with a large gap in the writes. Then the system loses power, or crashes. What happens then? Hey Bob, Thinking about this a bit more, you've given me an idea: Would it be worth ZFS occasionally reading previous uberblocks from the pool, just to check they are there and working ok? I wonder if you could do this after a few uberblocks have been written. It would seem to be a good way of catching devices that aren't writing correctly early on, as well as a way of guaranteeing that previous uberblocks are available to roll back to should a write go wrong. I wonder what the upper limits for this kind of write failure is going to be. I've seen 30 second delays mentioned in this thread. How often are uberblocks written? Is there any guarantee that we'll always have more than 30 seconds worth of uberblocks on a drive? Should ZFS be set so that it keeps either a given number of uberblocks, or 5 minutes worth of uberblocks, whichever is the larger? Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Ross Smith wrote: Also, that's a pretty extreme situation since you'd need a device that is being written to but not read from to fail in this exact way. It also needs to have no scrubbing being run, so the problem has remained undetected. On systems with a lot of RAM, 100% write is a pretty common situation since reads are often against data which are already cached in RAM. This is common when doing bulk data copies from one device to another (e.g. a backup from an internal pool to a USB-based pool) since the necessary filesystem information for the destination filesystem can be cached in memory for quick access rather than going to disk. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SPAM *** zpool create from spare partition
Hello, I formated unallocated partition using Gparted and now my table looks: sh-3.2# format -e Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0 Specify disk (enter its number): 0 selecting c9d0 NO Alt slice No defect list found Total disk size is 19457 cylinders Cylinder size is 16065 (512 byte) blocks Cylinders Partition StatusType Start End Length% = == = === == === 1 IFS: NTFS 0 50985099 26 2 ActiveSolaris2 5099 129317833 40 3 Linux native 12932 194566525 34 Can you give me an advice how to choose 3 Linux native partition? If I know where is this partition, then I can run: zpool create trunk c9d0XYZ right? Thanks for the answer. Regards, Jan Hlodan Jan Hlodan wrote: Hello, thanks for the answer. The partition table shows that Wind and OS run on: 1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0 Partition StatusType Start End Length% = == = === == === 1 IFS: NTFS 0 50985099 26 2 ActiveSolaris2 5099 129317833 40 The disk 0. c7t0d0 doesn't contain any disk type: AVAILABLE DISK SELECTIONS: 0. c7t0d0 drive type unknown /p...@0,0/pci104d,8...@1d,7/stor...@4/d...@0,0 1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0 Specify disk (enter its number): 0 Error occurred with device in usechecking: Bad file number Error: can't open disk '/dev/rdsk/c7t0d0p0'. AVAILABLE DRIVE TYPES: 0. Auto configure 1. other Specify disk type (enter its number): 0 Auto configuration via format.dat[no]? y Auto configure failed No Solaris fdisk partition found. If create some file system using the Gparted, my partition table will look like this: Cylinders Partition StatusType Start End Length% = == = === == === 1 IFS: NTFS 0 50985099 26 2 ActiveSolaris2 5099 129317833 40 3 Solaris xyz xyz 34 but I still don't know how to import this partition (num. 3) If I run: zpool create c9d0 I'll lost all my data, right? Regards, Jan Hlodan Will Murnane wrote: On Thu, Feb 12, 2009 at 21:59, Jan Hlodan jh231...@mail-emea.sun.com wrote: I would like to import 3. partition as a another pool but I can't see this partition. sh-3.2# format -e Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7t0d0 drive type unknown /p...@0,0/pci104d,8...@1d,7/stor...@4/d...@0,0 1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0 I guess that 0. is wind partition and 1. is Opensolaris What you see there are whole disks, not partitions. Try zpool status, which will show you that rpool is on something like c9d0s0. Then go into format again, pick 1 (in my example), type fdisk to look at the DOS-style partition table and verify that the partitioning of the disk matches what you thought it was. Then you can create a new zpool with something like zpool create data c9t0p3. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Ross Smith wrote: Thinking about this a bit more, you've given me an idea: Would it be worth ZFS occasionally reading previous uberblocks from the pool, just to check they are there and working ok? That sounds like a good idea. However, how do you know for sure that the data returned is not returned from a volatile cache? If the hardware is ignoring cache flush requests, then any data returned may be from a volatile cache. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs destroy hanging
This shouldn't be taking anywhere *near* half an hour. The snapshots differ trivially, by one or two files and less than 10k of data (they're test results from working on my backup script). But so far, it's still sitting there after more than half an hour. local...@fsfs:~/src/bup2# zfs destroy ruin/export cannot destroy 'ruin/export': filesystem has children use '-r' to destroy the following datasets: ruin/export/h...@bup-20090210-202557utc ruin/export/h...@20090210-213902utc ruin/export/home/local...@first ruin/export/home/local...@second ruin/export/home/local...@bup-20090210-202557utc ruin/export/home/local...@20090210-213902utc ruin/export/home/localddb ruin/export/home local...@fsfs:~/src/bup2# zfs destroy -r ruin/export It's still hung. Ah, here's zfs list output from shortly before I started the destroy: ruin 474G 440G 431G /backups/ruin ruin/export 35.0M 440G18K /backups/ruin/export ruin/export/home35.0M 440G19K /export/home ruin/export/home/localddb 35M 440G 27.8M /export/home/localddb As you can see, the ruin/export/home filesystem (and subs) is NOT large. iostat shows no activity on pool ruin over a minute. local...@fsfs:~$ pfexec zpool iostat ruin 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - ruin 474G 454G 10 0 1.13M840 ruin 474G 454G 0 0 0 0 ruin 474G 454G 0 0 0 0 ruin 474G 454G 0 0 0 0 ruin 474G 454G 0 0 0 0 ruin 474G 454G 0 0 0 0 ruin 474G 454G 0 0 0 0 ruin 474G 454G 0 0 0 0 ruin 474G 454G 0 0 0 0 The pool still thinks it is healthy. local...@fsfs:~$ zpool status -v ruin pool: ruin state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: scrub completed after 4h42m with 0 errors on Mon Feb 9 19:10:49 2009 config: NAMESTATE READ WRITE CKSUM ruinONLINE 0 0 0 c7t0d0ONLINE 0 0 0 errors: No known data errors There is still a process out there trying to run that destroy. It doesn't appear to be using much cpu time. local...@fsfs:~$ ps -ef | grep zfs localddb 7291 7228 0 15:10:56 pts/4 0:00 grep zfs root 7223 7101 0 14:18:27 pts/3 0:00 zfs destroy -r ruin/export Running 2008.11. local...@fsfs:~$ uname -a SunOS fsfs 5.11 snv_101b i86pc i386 i86pc Solaris Any suggestions? Eventually I'll kill the process by the gentlest way that works, I suppose (if it doesn't complete). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, Feb 13, 2009 at 02:00:28PM -0600, Nicolas Williams wrote: Ordering matters for atomic operations, and filesystems are full of those. Also, note that ignoring barriers is effectively as bad as dropping writes if there's any chance that some writes will never hit the disk because of, say, power failures. Imagine 100 txgs, but some writes from the first txg never hitting the disk because the drive keeps them in the cache without flushing them for too long, then you pull out the disk, or power fails -- in that case not even fallback to older txgs will help you, there'd be nothing that ZFS could do to help you. Of course, presumably even with most lousy drives you'd still have to be quite unlucky to lose writes written more than N txgs ago, for some value of N. But the point stands; what you lose will be a matter of chance (and it could well be whole datasets) given the kinds of devices we've been discussing. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
How does mounting the card work? Can one reverse the slot cover and screw it in like that, or is the card hanging free? unfortunately, the cover does not fit in the case, so I fixed it with a tip of hot glue; the same I used to fix the intel gig-e pci-e card (which is a low-profile version). not optimal, I know, but it works. nicola -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] set mountpoint but don't mount?
On January 30, 2009 1:09:49 PM -0500 Mark J Musante mmusante at east.sun.com wrote: On Fri, 30 Jan 2009, Frank Cusack wrote: so, is there a way to tell zfs not to perform the mounts for data2? or another way i can replicate the pool on the same host, without exporting the original pool? There is not a way to do that currently, but I know it's coming down the road. just for closure, a likely solution (seems correct, but i am unable to test just now) was presented in another thread. i note the answer here so that a search which finds this thread has both the question and answer in the same place. On January 31, 2009 10:57:11 AM +0100 Kees Nuyt k.nuyt at zonnet.nl wrote: That property is called canmount. man zfs /canmount i didn't test, but it seems that setting canmount to noauto, replicating, then changing canmount back to on, would do the trick. It turns out this doesn't work for datasets that are mounted in the global zone that you can't unmount. Setting the canmount property to 'noauto' has the side effect (why?) of immediately unmounting, and failing if it can't do so. For datasets which are zoned, if you are running the 'zfs set' in the global zone, the dataset remains mounted in the zone. But for datasets mounted in the global zone, e.g. being served via NFS, the 'zfs set' fails. Funny though, I wrote the above and tested a few more times, and now I do have one of my home directories' canmount property set to 'noauto', and I can no longer change it back to 'on'. How it got set to 'noauto' is a mystery as it was never unmounted during the brief time I have been composing this email, and I was consistently getting an error message from zfs about it being in use. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Richard Elling wrote: Greg Palmer wrote: Miles Nordin wrote: gm That implies that ZFS will have to detect removable devices gm and treat them differently than fixed devices. please, no more of this garbage, no more hidden unchangeable automatic condescending behavior. The whole format vs rmformat mess is just ridiculous. And software and hardware developers alike have both proven themselves incapable of settling on a definition of ``removeable'' that fits with actual use-cases like: FC/iSCSI; hot-swappable SATA; adapters that have removeable sockets on both ends like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so on. Since this discussion is taking place in the context of someone removing a USB stick I think you're confusing the issue by dragging in other technologies. Let's keep this in the context of the posts preceding it which is how USB devices are treated. I would argue that one of the first design goals in an environment where you can expect people who are not computer professionals to be interfacing with computers is to make sure that the appropriate safeties are in place and that the system does not behave in a manner which a reasonable person might find unexpected. It has been my experience that USB sticks use FAT, which is an ancient file system which contains few of the features you expect from modern file systems. As such, it really doesn't do any write caching. Hence, it seems to work ok for casual users. I note that neither NTFS, ZFS, reiserfs, nor many of the other, high performance file systems are used by default for USB devices. Could it be that anyone not using FAT for USB devices is straining against architectural limits? I'd follow that up by saying that those of us who do use something other that FAT with USB devices have a reasonable understanding of the limitations of those devices. Using ZFS is non-trivial from a typical user's perspective. The device has to be identified and the pool created. When a USB device is connected, the pool has to be manually imported before it can be used. Import/export could be fully integrated with gnome. Once that is in place, using a ZFS formatted USB stick should be just as safe as a FAT formatted one. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
You don't, but that's why I was wondering about time limits. You have to have a cut off somewhere, but if you're checking the last few minutes of uberblocks that really should cope with a lot. It seems like a simple enough thing to implement, and if a pool still gets corrupted with these checks in place, you can absolutely, positively blame it on the hardware. :D However, I've just had another idea. Since the uberblocks are pretty vital in recovering a pool, and I believe it's a fair bit of work to search the disk to find them. Might it be a good idea to allow ZFS to store uberblock locations elsewhere for recovery purposes? This could be as simple as a USB stick plugged into the server, a separate drive, or a network server. I guess even the ZIL device would work if it's separate hardware. But knowing the locations of the uberblocks would save yet more time should recovery be needed. On Fri, Feb 13, 2009 at 8:59 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Fri, 13 Feb 2009, Ross Smith wrote: Thinking about this a bit more, you've given me an idea: Would it be worth ZFS occasionally reading previous uberblocks from the pool, just to check they are there and working ok? That sounds like a good idea. However, how do you know for sure that the data returned is not returned from a volatile cache? If the hardware is ignoring cache flush requests, then any data returned may be from a volatile cache. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Tim wrote: On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us mailto:bfrie...@simple.dallas.tx.us wrote: On Fri, 13 Feb 2009, Ross Smith wrote: However, I've just had another idea. Since the uberblocks are pretty vital in recovering a pool, and I believe it's a fair bit of work to search the disk to find them. Might it be a good idea to allow ZFS to store uberblock locations elsewhere for recovery purposes? Perhaps it is best to leave decisions on these issues to the ZFS designers who know how things work. Previous descriptions from people who do know how things work didn't make it sound very difficult to find the last 20 uberblocks. It sounded like they were at known points for any given pool. Those folks have surely tired of this discussion by now and are working on actual code rather than reading idle discussion between several people who don't know the details of how things work. People who don't know how things work often aren't tied down by the baggage of knowing how things work. Which leads to creative solutions those who are weighed down didn't think of. I don't think it hurts in the least to throw out some ideas. If they aren't valid, it's not hard to ignore them and move on. It surely isn't a waste of anyone's time to spend 5 minutes reading a response and weighing if the idea is valid or not. OTOH, anyone who followed this discussion the last few times, has looked at the on-disk format documents, or reviewed the source code would know that the uberblocks are kept in an 128-entry circular queue which is 4x redundant with 2 copies each at the beginning and end of the vdev. Other metadata, by default, is 2x redundant and spatially diverse. Clearly, the failure mode being hashed out here has resulted in the defeat of those protections. The only real question is how fast Jeff can roll out the feature to allow reverting to previous uberblocks. The procedure for doing this by hand has long been known, and was posted on this forum -- though it is tedious. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS on SAN?
Hi, When I read the ZFS manual, it usually recommends to configure redundancy at the ZFS layer, mainly because there are features that will work only with redundant configuration (like corrupted data correction), also it implies that the overall robustness will improve. My question is simple, what is the recommended configuration on SAN (on high-end EMC, like the Symmetrix DMX series for example) where usually the redundancy is configured at the array level, so most likely we would use simple ZFS layout, without redundancy? Is it worth to move the redundancy from the SAN array layer to the ZFS layer? (configuring redundancy on both layers is sounds like a waste to me) There are certain advantages on the array to have redundancy configured (beyond the protection against simple disk failure). Can we compare the advantages of having (for example) RAID5 configured on a high-end SAN with no redundancy at the ZFS layer versus no redundant RAID configuration on the high-end SAN but having raidz or raidz2 on the ZFS layer? Any tests, experience or best practices regarding this topic? How does ZFS perform (from performance and robustness (or availability if you like) point of view) on high-end SANs, compared to a VSF for example? If you could share your experience with me, I would really appreciate that. Regards, sendai -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Fri, 13 Feb 2009, Tim wrote: I don't think it hurts in the least to throw out some ideas. If they aren't valid, it's not hard to ignore them and move on. It surely isn't a waste of anyone's time to spend 5 minutes reading a response and weighing if the idea is valid or not. Today I sat down at 9:00 AM to read the new mail for the day and did not catch up until five hours later. Quite a lot of the reading was this (now) useless discussion thread. It is now useless since after five hours of reading, there were no ideas expressed that had not been expressed before. With this level of overhead, I am surprise that there is any remaining development motion on ZFS at all. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 13, 2009 7:58:51 PM -0600 Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: With this level of overhead, I am surprise that there is any remaining development motion on ZFS at all. come on now. with all due respect, you are attempting to stifle relevant discussion and that is, well, bordering on ridiculous. i sure have learned a lot from this thread. now of course that is meaningless because i don't and almost certainly never will contribute to zfs, but i assume there are others who have learned from this thread. that's definitely a good thing. this thread also appears to be the impetus to change priorities on zfs development. Today I sat down at 9:00 AM to read the new mail for the day and did not catch up until five hours later. Quite a lot of the reading was this (now) useless discussion thread. It is now useless since after five hours of reading, there were no ideas expressed that had not been expressed before. lastly, WOW! if this thread is worthless to you, learn to use the delete button. especially if you read that slowly. i know i certainly couldn't keep up with all my incoming mail if i read everything. i'm sorry to berate you, as you do make very valuable contributions to the discussion here, but i take offense at your attempts to limit discussion simply because you know everything there is to know about the subject. great, now i am guilty of being overhead. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Hi Bob, On Fri, 13 Feb 2009 19:58:51 -0600 (CST) Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Fri, 13 Feb 2009, Tim wrote: I don't think it hurts in the least to throw out some ideas. If they aren't valid, it's not hard to ignore them and move on. It surely isn't a waste of anyone's time to spend 5 minutes reading a response and weighing if the idea is valid or not. Today I sat down at 9:00 AM to read the new mail for the day and did not catch up until five hours later. Quite a lot of the reading was this (now) useless discussion thread. It is now useless since after five hours of reading, there were no ideas expressed that had not been expressed before. I've found this thread to be like watching a car accident, and also really frustrating due to the inability to use search engines on the part of many posters. With this level of overhead, I am surprise that there is any remaining development motion on ZFS at all. Good thing the ZFS developers have mail filters :-) cheers, James -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on SAN?
Damon, Yes, we can provide simple concat inside the array (even though today we provide RAID5 or RAID1 as our standard, and using Veritas with concat), the question is more of if it's worth it to switch the redundancy from the array to the ZFS layer. The RAID5/1 features of the high-end EMC arrays also provide performance improvements, that's why I wonder what would be the pros/cons of such a switch (I mean the switch of the redundancy from the array to the ZFS layer). So, you telling me that even if the SAN provides redundancy (HW RAID5 or RAID1), people still configure ZFS with either raidz or mirror? Regards, sendai On Sat, Feb 14, 2009 at 6:06 AM, Damon Atkins damon.atk...@_no_spam_yahoo.com.au wrote: Andras, It you can get Concat Disk or Raid 0 Disk inside the array, then use RaidZ (if I/O is not large amount or its mostly sequential) if very high I/O then use ZFS Mirror. You can not spread a zpool over multiple EMC Arrays using SRDF if you are not using EMC Power Path. HDS for example does not support anything other than Mirror or RAID5 configuration, so RaidZ or ZFS Mirror results in a lot of wasted disk space. However people still use RaidZ on HDS Raid5. As the top of the line HDS arrays are very fast and they want the features offered by ZFS. Cheers -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss