Re: [zfs-discuss] Why RAID 5 stops working in 2009

2008-07-05 Thread Ross
I've read various articles along those lines.  My understanding is that a 500GB 
odd raid-z / raid-5 array has around a 1 in 10 chance of loosing at least some 
data during a rebuild.

I'd have raid-5 arrays fail at least 4 times, twice during a rebuild.  In most 
cases I've been able to recover the data (once by re-attaching the original 
failed drive since it proved more reliable than the 2nd one that failed).  
However on more than one occasion I've had to revert to backups.  Raid-6 was 
something I was waiting a long time for.

Now I use dual parity for everything I buy.  At home I've a six drive raid-z2 
box, at work the main server is a 16 drive 2 way mirror setup.  When using SATA 
drives capacity is cheap enough (that work server is still 2.5TB for around 
£2,500) and the peace of mind, particularly on the company servers is worth 
every penny.

If you're stuck with single parity raid-z, my advice would be to simply take a 
good set of backups and leave it at that until you can upgrade to dual parity.  
At the end of the day, the risk is relatively slight, and you're data's 
probably as much risk if you try to pro-actively replace a drive as if you just 
replace one when it fails.

Just scrub every so often, and make sure you've got good backups.  I don't 
expect you'll see too many problems.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why RAID 5 stops working in 2009

2008-07-05 Thread Ross
Just re-read that and it's badly phrased.  What I meant to say is that a raid-z 
/ raid-5 array based on 500GB drives seems to have around a 1 in 10 chance of 
loosing some data during a full rebuild.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why RAID 5 stops working in 2009

2008-07-05 Thread Kyle McDonald
Ross wrote:
 Just re-read that and it's badly phrased.  What I meant to say is that a 
 raid-z / raid-5 array based on 500GB drives seems to have around a 1 in 10 
 chance of loosing some data during a full rebuild.
  
  
   
Actually, I think it's been explained already why this is actually one 
area where RAID-Z will really start to show some of the was it's 
different than it's RAID-5 ancestors. For one, A RAID-5 controller has 
no idea of the filesystem, and there for has to rebuild every bit on the 
disk, whether it's used or not, and if it cant' it will declare the 
whole array unusable. RAID-Z on the other hand since it is integrated 
with the filesystem, only needs to rebuild the *used* data, and won't 
care if unused parts of the disks can't be rebuilt.

Second, a factor that the author of that article leaves out is that 
decent RAID-5, and RAID-Z can do 'scrubs' of the data at regular 
intervals, and this will many times catch and deal with these read 
problems well before they have a chance to take all your data with them. 
The types of errors the author writes about many times are caused by how 
accurately the block was written and not a defect of the media, so many 
times they can be fixed by just rewriting the data to the same block. On 
ZFS this will almost never happen, because of COW it will always choose 
a new block to write to. I don't think many (if any) RAID-5 
implementaions can change the location of data on a drive.

 -Kyle

 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Recovering an array on Mac

2008-07-05 Thread Lee Fyock

Hi--

Here's the scoop, in probably too much detail:

I'm a sucker for new filesystems and new tech in general. For you old- 
time Mac people, I installed Sequoia when it was first seeded, and had  
to reformat my drive several times as it grew to the final release. I  
flipped the journaled flag before I even knew what it meant. I  
installed the pre-Leopard ZFS seed and have been using it for, what, a  
year?


So, I started with two 500 GB drives in a single pool, not mirrored. I  
bought a 1 TB drive and added it to the pool. I bought another 1 TB  
drive, and finally had enough storage (~1.5 TB) to mirror my disks and  
be all set for the foreseeable future.


In order to migrate my data from a single pool of 500 GB + 500 GB + 1  
TB to a mirrored 500GB/500GB + 1TB/1TB pool, I was planning on doing  
this:


1) Copy everything to the New 1 TB drive (slopping what wouldn't fit  
onto another spare drive)

2) Upgrade to the latest ZFS for Mac release (117)
3) Destroy the existing pool
4) Create a pool with the two 500 GB drives
5) Copy everything from the New drive to the 500 GB x 2 pool
6) Create a mirrored pool with the two 1 TB drives
7) Copy everything from the 500 GB x 2 pool to the mirrored 1 TB pool
8) Destroy the 500 GB x 2 pool, and create it as a 500GB/500GB  
mirrored pair and add it to the 1TB/1TB pool


During step 7, while I was at work, the power failed at home,  
apparently long enough to drain my UPS.


When I rebooted my machine, both pools refused to mount: the 500+500  
pool and the 1TB/1TB mirrored pool. Just about all my data is lost.  
This was my media server containing my DVD rips, so everything is  
recoverable in that I can re-rip 1+TB, but I'd rather not.


diskutil list says this:
/dev/disk1
   #:   TYPE NAMESIZE
IDENTIFIER
   0: FDisk_partition_scheme*465.8 Gi
disk1
   1:465.8 Gi
disk1s1

/dev/disk2
   #:   TYPE NAMESIZE
IDENTIFIER
   0: FDisk_partition_scheme*465.8 Gi
disk2
   1:465.8 Gi
disk2s1

/dev/disk3
   #:   TYPE NAMESIZE
IDENTIFIER
   0: FDisk_partition_scheme*931.5 Gi
disk3
   1:931.5 Gi
disk3s1

/dev/disk4
   #:   TYPE NAMESIZE
IDENTIFIER
   0: FDisk_partition_scheme*931.5 Gi
disk4
   1:931.5 Gi
disk4s1


During step 2, I created the pools using zpool create media mirror / 
dev/disk3 /dev/disk4 then zpool upgrade, since I got warnings that  
the filesystem version was out of date. Note that I created zpools  
referring to the entire disk, not just a slice. I had labelled the  
disks using

diskutil partitiondisk /dev/disk2 GPTFormat ZFS %noformat% 100%
but now the disks indicate that they're FDisk_partition_scheme.

Googling for FDisk_partition_scheme yields http://lists.macosforge.org/pipermail/zfs-discuss/2008-March/000240.html 
, among other things, but no hint of where to go from here.


zpool import -D reports no pools available to import.

All of this is on a Mac Mini running Mac OS X 10.5.3, BTW. I own  
Parallels if using an OpenSolaris build would be of use.


So, is the data recoverable?

Thanks!
Lee

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] iostat and monitoring

2008-07-05 Thread Matt Harrison
Hi gurus,

I like zpool iostat and I like system monitoring, so I setup a script 
within sma to let me get the zpool iostat figures through snmp.

The problem is that as zpool iostat is only run once for each snmp 
query, it always reports a static set of figures, like so:

[EMAIL PROTECTED]:snmp # zpool iostat -v
capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
tank 443G  1.60T  4  4   461K   467K
   raidz1 443G  1.60T  4  4   461K   467K
 c1t0d0  -  -  1  2   227K   234K
 c1t1d0  -  -  1  2   228K   234K
 c2t0d0  -  -  1  2   227K   234K
--  -  -  -  -  -  -

Whereas if I run it an interval, the figures even out after a few 
seconds. What I'm wondering is: Is there any way to get iostat to report 
accurate figures from a one time invocation?

Alternatively is there a better way to get read/write ops etc from my 
pool for monitoring applications?

I would really love if monitoring zfs pools from snmp was better all 
round, but I'm not going to reel off my wish list here at this point ;)

Thanks

Matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bug id 6343667

2008-07-05 Thread Robert Lawhead
About a month ago (Jun 2008), I received information indicating that a putback 
fixing this problem was in the works and might appear as soon as b92.  
Apparently this estimate was overly optimistic; Does anyone know anything about 
progress on this issue or have a revised estimate for the putback?
Thanks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bug id 6343667

2008-07-05 Thread Johan Hartzenberg
On Sat, Jul 5, 2008 at 9:34 PM, Robert Lawhead 
[EMAIL PROTECTED] wrote:

 About a month ago (Jun 2008), I received information indicating that a
 putback fixing this problem was in the works and might appear as soon as
 b92.  Apparently this estimate was overly optimistic; Does anyone know
 anything about progress on this issue or have a revised estimate for the
 putback?
 Thanks.

 This page:
http://bugs.opensolaris.org/view_bug.do?bug_id=6343667

Says the putback will be in SNV 94
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iostat and monitoring

2008-07-05 Thread Mike Gerdts
On Sat, Jul 5, 2008 at 2:33 PM, Matt Harrison
[EMAIL PROTECTED] wrote:
 Alternatively is there a better way to get read/write ops etc from my
 pool for monitoring applications?

 I would really love if monitoring zfs pools from snmp was better all
 round, but I'm not going to reel off my wish list here at this point ;)

You can access the kstats directly to get the counter values.

$ kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes}
unix:0:vopstats_zfs:nread   418787
unix:0:vopstats_zfs:read_bytes  612076305
unix:0:vopstats_zfs:nwrite  163544
unix:0:vopstats_zfs:write_bytes 255725992

These are the counters used by fsstat.  In the case of a single pool,
I would expect (perhaps naively) to match up with zpool iostat
numbers.

On my list of things to do when I get around to it is to enable
parseable output in fsstat(1M).  See
http://mail.opensolaris.org/pipermail/on-discuss/2008-June/000127.html
for details.  Parseable is currently disabled for reasons that are
discussed in the mail folder, linked at
http://opensolaris.org/os/community/arc/caselog/2006/180/.

It is interesting to look at the numbers at this level compared to
iostat.  While iostat shows physical reads and writes only zpool
iostat and fsstat show reads that are satisfied by a cache and never
result in physical I/O activity.  As such, a workload that looks
write-intensive on UFS monitored via iostat may seem to have shifted
to being very read intensive.

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bug id 6343667

2008-07-05 Thread Ross
If it ever does get released I'd love to hear about it.  That bug, and the fact 
it appears to have been outstanding for three years, was one of the major 
reasons behind us not purchasing a bunch of x4500's.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iostat and monitoring

2008-07-05 Thread Matt Harrison
Mike Gerdts wrote:
 $ kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes}
 unix:0:vopstats_zfs:nread 418787
 unix:0:vopstats_zfs:read_bytes612076305
 unix:0:vopstats_zfs:nwrite163544
 unix:0:vopstats_zfs:write_bytes   255725992

Thanks Mike, thats exactly what I was looking for. I can work my way
around the other snmp problems, like not reporting total space on a zfs :)

Thanks

Matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool i/o error

2008-07-05 Thread Victor Pajor
Booted from 2008.05
and the error was the same as before: corrupted data for both last disks.

zdb -l was the same as before: read label from disk 1 but not from disks 2  3.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bug id 6343667

2008-07-05 Thread Jeff Bonwick
FYI, we are literally just days from having this fixed.

Matt: after putback you really should blog about this one --
both to let people know that this long-standing bug has been
fixed, and to describe your approach to it.

It's a surprisingly tricky and interesting problem.

Jeff

On Sat, Jul 05, 2008 at 01:20:11PM -0700, Ross wrote:
 If it ever does get released I'd love to hear about it.  That bug, and the 
 fact it appears to have been outstanding for three years, was one of the 
 major reasons behind us not purchasing a bunch of x4500's.
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iostat and monitoring

2008-07-05 Thread Brian Hechinger
On Sat, Jul 05, 2008 at 03:03:34PM -0500, Mike Gerdts wrote:
 
 You can access the kstats directly to get the counter values.

First off, let me say that:  kstat++

That's too cool.

 $ kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes}
 unix:0:vopstats_zfs:nread 418787
 unix:0:vopstats_zfs:read_bytes612076305
 unix:0:vopstats_zfs:nwrite163544
 unix:0:vopstats_zfs:write_bytes   255725992

# kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes}
#

uhm, but:

kstat -p ::vopstats_zfs
[snip]
unix:0:vopstats_zfs:nwrite  24201307
unix:0:vopstats_zfs:read_bytes  1557032944566
unix:0:vopstats_zfs:readdir_bytes   129267
unix:0:vopstats_zfs:snaptime3281423.01228961
unix:0:vopstats_zfs:write_bytes 222641182203

what gives? This is:

SunOS wiggum.4amlunch.net 5.11 snv_81 i86pc i386 i86pc

-brian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iostat and monitoring

2008-07-05 Thread Mike Gerdts
On Sat, Jul 5, 2008 at 9:48 PM, Brian Hechinger [EMAIL PROTECTED] wrote:
 On Sat, Jul 05, 2008 at 03:03:34PM -0500, Mike Gerdts wrote:
 $ kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes}
 unix:0:vopstats_zfs:nread 418787
 unix:0:vopstats_zfs:read_bytes612076305
 unix:0:vopstats_zfs:nwrite163544
 unix:0:vopstats_zfs:write_bytes   255725992

This was on a virtual machine with a 12 GB zpool (one virtual disk)
that had been up for a few days (but suspended most of the time).  My
guess is that most of the activity my zpool was seeing was from the
swap device.

 # kstat -p ::vopstats_zfs:{nread,read_bytes,nwrite,write_bytes}
 #

 uhm, but:

 kstat -p ::vopstats_zfs
 [snip]
 unix:0:vopstats_zfs:nwrite  24201307

24 million write operations.

 unix:0:vopstats_zfs:read_bytes  1557032944566

$ perl -e 'print (1557032944566  30)'
1450

Looks like you've read about 1.4 TB since boot.

 unix:0:vopstats_zfs:readdir_bytes   129267

1.2 GB of readdir activity.  Lots of files?  Is someone doing find or
du through the area with lots of files?

 unix:0:vopstats_zfs:snaptime3281423.01228961
 unix:0:vopstats_zfs:write_bytes 222641182203

$ perl -e 'print (222641182203  30)'
207

207 MB of writes.

$ perl -e 'print 222641182203 / 24201307'
9199.55199952631

Average write size was a bit over 9 KB.


 what gives? This is:

 SunOS wiggum.4amlunch.net 5.11 snv_81 i86pc i386 i86pc

Do the numbers seem unreasonable for the size of the pool, the uptime
of the system, etc.?  Remember my comments earlier about how you can
now see the reads (and readdirs) that came from cache and didn't do
physical I/O.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss