Re: [zfs-discuss] Why RAID 5 stops working in 2009
I've read various articles along those lines. My understanding is that a 500GB odd raid-z / raid-5 array has around a 1 in 10 chance of loosing at least some data during a rebuild. I'd have raid-5 arrays fail at least 4 times, twice during a rebuild. In most cases I've been able to recover the data (once by re-attaching the original failed drive since it proved more reliable than the 2nd one that failed). However on more than one occasion I've had to revert to backups. Raid-6 was something I was waiting a long time for. Now I use dual parity for everything I buy. At home I've a six drive raid-z2 box, at work the main server is a 16 drive 2 way mirror setup. When using SATA drives capacity is cheap enough (that work server is still 2.5TB for around £2,500) and the peace of mind, particularly on the company servers is worth every penny. If you're stuck with single parity raid-z, my advice would be to simply take a good set of backups and leave it at that until you can upgrade to dual parity. At the end of the day, the risk is relatively slight, and you're data's probably as much risk if you try to pro-actively replace a drive as if you just replace one when it fails. Just scrub every so often, and make sure you've got good backups. I don't expect you'll see too many problems. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why RAID 5 stops working in 2009
Just re-read that and it's badly phrased. What I meant to say is that a raid-z / raid-5 array based on 500GB drives seems to have around a 1 in 10 chance of loosing some data during a full rebuild. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why RAID 5 stops working in 2009
Ross wrote: Just re-read that and it's badly phrased. What I meant to say is that a raid-z / raid-5 array based on 500GB drives seems to have around a 1 in 10 chance of loosing some data during a full rebuild. Actually, I think it's been explained already why this is actually one area where RAID-Z will really start to show some of the was it's different than it's RAID-5 ancestors. For one, A RAID-5 controller has no idea of the filesystem, and there for has to rebuild every bit on the disk, whether it's used or not, and if it cant' it will declare the whole array unusable. RAID-Z on the other hand since it is integrated with the filesystem, only needs to rebuild the *used* data, and won't care if unused parts of the disks can't be rebuilt. Second, a factor that the author of that article leaves out is that decent RAID-5, and RAID-Z can do 'scrubs' of the data at regular intervals, and this will many times catch and deal with these read problems well before they have a chance to take all your data with them. The types of errors the author writes about many times are caused by how accurately the block was written and not a defect of the media, so many times they can be fixed by just rewriting the data to the same block. On ZFS this will almost never happen, because of COW it will always choose a new block to write to. I don't think many (if any) RAID-5 implementaions can change the location of data on a drive. -Kyle This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Recovering an array on Mac
Hi-- Here's the scoop, in probably too much detail: I'm a sucker for new filesystems and new tech in general. For you old- time Mac people, I installed Sequoia when it was first seeded, and had to reformat my drive several times as it grew to the final release. I flipped the journaled flag before I even knew what it meant. I installed the pre-Leopard ZFS seed and have been using it for, what, a year? So, I started with two 500 GB drives in a single pool, not mirrored. I bought a 1 TB drive and added it to the pool. I bought another 1 TB drive, and finally had enough storage (~1.5 TB) to mirror my disks and be all set for the foreseeable future. In order to migrate my data from a single pool of 500 GB + 500 GB + 1 TB to a mirrored 500GB/500GB + 1TB/1TB pool, I was planning on doing this: 1) Copy everything to the New 1 TB drive (slopping what wouldn't fit onto another spare drive) 2) Upgrade to the latest ZFS for Mac release (117) 3) Destroy the existing pool 4) Create a pool with the two 500 GB drives 5) Copy everything from the New drive to the 500 GB x 2 pool 6) Create a mirrored pool with the two 1 TB drives 7) Copy everything from the 500 GB x 2 pool to the mirrored 1 TB pool 8) Destroy the 500 GB x 2 pool, and create it as a 500GB/500GB mirrored pair and add it to the 1TB/1TB pool During step 7, while I was at work, the power failed at home, apparently long enough to drain my UPS. When I rebooted my machine, both pools refused to mount: the 500+500 pool and the 1TB/1TB mirrored pool. Just about all my data is lost. This was my media server containing my DVD rips, so everything is recoverable in that I can re-rip 1+TB, but I'd rather not. diskutil list says this: /dev/disk1 #: TYPE NAMESIZE IDENTIFIER 0: FDisk_partition_scheme*465.8 Gi disk1 1:465.8 Gi disk1s1 /dev/disk2 #: TYPE NAMESIZE IDENTIFIER 0: FDisk_partition_scheme*465.8 Gi disk2 1:465.8 Gi disk2s1 /dev/disk3 #: TYPE NAMESIZE IDENTIFIER 0: FDisk_partition_scheme*931.5 Gi disk3 1:931.5 Gi disk3s1 /dev/disk4 #: TYPE NAMESIZE IDENTIFIER 0: FDisk_partition_scheme*931.5 Gi disk4 1:931.5 Gi disk4s1 During step 2, I created the pools using zpool create media mirror / dev/disk3 /dev/disk4 then zpool upgrade, since I got warnings that the filesystem version was out of date. Note that I created zpools referring to the entire disk, not just a slice. I had labelled the disks using diskutil partitiondisk /dev/disk2 GPTFormat ZFS %noformat% 100% but now the disks indicate that they're FDisk_partition_scheme. Googling for FDisk_partition_scheme yields http://lists.macosforge.org/pipermail/zfs-discuss/2008-March/000240.html , among other things, but no hint of where to go from here. zpool import -D reports no pools available to import. All of this is on a Mac Mini running Mac OS X 10.5.3, BTW. I own Parallels if using an OpenSolaris build would be of use. So, is the data recoverable? Thanks! Lee ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] iostat and monitoring
Hi gurus, I like zpool iostat and I like system monitoring, so I setup a script within sma to let me get the zpool iostat figures through snmp. The problem is that as zpool iostat is only run once for each snmp query, it always reports a static set of figures, like so: [EMAIL PROTECTED]:snmp # zpool iostat -v capacity operationsbandwidth pool used avail read write read write -- - - - - - - tank 443G 1.60T 4 4 461K 467K raidz1 443G 1.60T 4 4 461K 467K c1t0d0 - - 1 2 227K 234K c1t1d0 - - 1 2 228K 234K c2t0d0 - - 1 2 227K 234K -- - - - - - - Whereas if I run it an interval, the figures even out after a few seconds. What I'm wondering is: Is there any way to get iostat to report accurate figures from a one time invocation? Alternatively is there a better way to get read/write ops etc from my pool for monitoring applications? I would really love if monitoring zfs pools from snmp was better all round, but I'm not going to reel off my wish list here at this point ;) Thanks Matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bug id 6343667
About a month ago (Jun 2008), I received information indicating that a putback fixing this problem was in the works and might appear as soon as b92. Apparently this estimate was overly optimistic; Does anyone know anything about progress on this issue or have a revised estimate for the putback? Thanks. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bug id 6343667
On Sat, Jul 5, 2008 at 9:34 PM, Robert Lawhead [EMAIL PROTECTED] wrote: About a month ago (Jun 2008), I received information indicating that a putback fixing this problem was in the works and might appear as soon as b92. Apparently this estimate was overly optimistic; Does anyone know anything about progress on this issue or have a revised estimate for the putback? Thanks. This page: http://bugs.opensolaris.org/view_bug.do?bug_id=6343667 Says the putback will be in SNV 94 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iostat and monitoring
On Sat, Jul 5, 2008 at 2:33 PM, Matt Harrison [EMAIL PROTECTED] wrote: Alternatively is there a better way to get read/write ops etc from my pool for monitoring applications? I would really love if monitoring zfs pools from snmp was better all round, but I'm not going to reel off my wish list here at this point ;) You can access the kstats directly to get the counter values. $ kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes} unix:0:vopstats_zfs:nread 418787 unix:0:vopstats_zfs:read_bytes 612076305 unix:0:vopstats_zfs:nwrite 163544 unix:0:vopstats_zfs:write_bytes 255725992 These are the counters used by fsstat. In the case of a single pool, I would expect (perhaps naively) to match up with zpool iostat numbers. On my list of things to do when I get around to it is to enable parseable output in fsstat(1M). See http://mail.opensolaris.org/pipermail/on-discuss/2008-June/000127.html for details. Parseable is currently disabled for reasons that are discussed in the mail folder, linked at http://opensolaris.org/os/community/arc/caselog/2006/180/. It is interesting to look at the numbers at this level compared to iostat. While iostat shows physical reads and writes only zpool iostat and fsstat show reads that are satisfied by a cache and never result in physical I/O activity. As such, a workload that looks write-intensive on UFS monitored via iostat may seem to have shifted to being very read intensive. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bug id 6343667
If it ever does get released I'd love to hear about it. That bug, and the fact it appears to have been outstanding for three years, was one of the major reasons behind us not purchasing a bunch of x4500's. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iostat and monitoring
Mike Gerdts wrote: $ kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes} unix:0:vopstats_zfs:nread 418787 unix:0:vopstats_zfs:read_bytes612076305 unix:0:vopstats_zfs:nwrite163544 unix:0:vopstats_zfs:write_bytes 255725992 Thanks Mike, thats exactly what I was looking for. I can work my way around the other snmp problems, like not reporting total space on a zfs :) Thanks Matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool i/o error
Booted from 2008.05 and the error was the same as before: corrupted data for both last disks. zdb -l was the same as before: read label from disk 1 but not from disks 2 3. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bug id 6343667
FYI, we are literally just days from having this fixed. Matt: after putback you really should blog about this one -- both to let people know that this long-standing bug has been fixed, and to describe your approach to it. It's a surprisingly tricky and interesting problem. Jeff On Sat, Jul 05, 2008 at 01:20:11PM -0700, Ross wrote: If it ever does get released I'd love to hear about it. That bug, and the fact it appears to have been outstanding for three years, was one of the major reasons behind us not purchasing a bunch of x4500's. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iostat and monitoring
On Sat, Jul 05, 2008 at 03:03:34PM -0500, Mike Gerdts wrote: You can access the kstats directly to get the counter values. First off, let me say that: kstat++ That's too cool. $ kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes} unix:0:vopstats_zfs:nread 418787 unix:0:vopstats_zfs:read_bytes612076305 unix:0:vopstats_zfs:nwrite163544 unix:0:vopstats_zfs:write_bytes 255725992 # kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes} # uhm, but: kstat -p ::vopstats_zfs [snip] unix:0:vopstats_zfs:nwrite 24201307 unix:0:vopstats_zfs:read_bytes 1557032944566 unix:0:vopstats_zfs:readdir_bytes 129267 unix:0:vopstats_zfs:snaptime3281423.01228961 unix:0:vopstats_zfs:write_bytes 222641182203 what gives? This is: SunOS wiggum.4amlunch.net 5.11 snv_81 i86pc i386 i86pc -brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iostat and monitoring
On Sat, Jul 5, 2008 at 9:48 PM, Brian Hechinger [EMAIL PROTECTED] wrote: On Sat, Jul 05, 2008 at 03:03:34PM -0500, Mike Gerdts wrote: $ kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes} unix:0:vopstats_zfs:nread 418787 unix:0:vopstats_zfs:read_bytes612076305 unix:0:vopstats_zfs:nwrite163544 unix:0:vopstats_zfs:write_bytes 255725992 This was on a virtual machine with a 12 GB zpool (one virtual disk) that had been up for a few days (but suspended most of the time). My guess is that most of the activity my zpool was seeing was from the swap device. # kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes} # uhm, but: kstat -p ::vopstats_zfs [snip] unix:0:vopstats_zfs:nwrite 24201307 24 million write operations. unix:0:vopstats_zfs:read_bytes 1557032944566 $ perl -e 'print (1557032944566 30)' 1450 Looks like you've read about 1.4 TB since boot. unix:0:vopstats_zfs:readdir_bytes 129267 1.2 GB of readdir activity. Lots of files? Is someone doing find or du through the area with lots of files? unix:0:vopstats_zfs:snaptime3281423.01228961 unix:0:vopstats_zfs:write_bytes 222641182203 $ perl -e 'print (222641182203 30)' 207 207 MB of writes. $ perl -e 'print 222641182203 / 24201307' 9199.55199952631 Average write size was a bit over 9 KB. what gives? This is: SunOS wiggum.4amlunch.net 5.11 snv_81 i86pc i386 i86pc Do the numbers seem unreasonable for the size of the pool, the uptime of the system, etc.? Remember my comments earlier about how you can now see the reads (and readdirs) that came from cache and didn't do physical I/O. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss