Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-28 Thread Erik Trimble

Richard Elling wrote:

Erik Trimble wrote:

If you had known about the drive sizes beforehand, the you could have
done something like this:

Partition the drives as follows:

A:  1 20GB partition
B:  1 20gb  1 10GB partition
C:  1 40GB partition
D:  1 40GB partition  2 10GB paritions

then you do:

zpool create tank mirror Ap0 Bp0 mirror Cp0 Dp0 mirror Bp1 Dp1

and you get a total of 70GB of space. However, the performance on this
is going to be bad (as you frequently need to write to both partitions
on B  D, causing head seek), though you can still lose up to 2 drives
before experiencing data loss.


It is not clear to me that we can say performance will be bad
for stripes on single disks.  The reason is that ZFS dynamic
striping does not use a fixed interleave.  In other words, if
I write a block of N bytes to a M-way dynamic stripe, it is
not guaranteed that each device will get an I/O of N/M size.
I've only done a few measurements of this, and I've not completed
my analysis, but my data does not show the sort of thrashing one
might expect from a fixed stripe with small interleave.
 -- richard
That is correct, Richard.  However, it applies to relatively small 
read/writes, which do not exceed the max stripe size.  Now, this is 
probably pretty likely, but there is another issue here:   even given 
that not all disks will have an I/O on a stripe access, there still is a 
relatively good chance that both partitions on the disk get an I/O 
request.  On average, I'd assume that you don't really improve much over 
a full-stripe I/O, and in either case, it would be worse than a zpool 
which did not have multiple partitions on the same disk.   Also, for 
large file access - where you guaranty the need for full-stripe access - 
you certainly are going to disk thrash.


Numbers would be nice, of course. :-)

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Darren Dunham
 Perhaps I'm not asking my question clearly. I've already experimented
 a fair amount with zfs, including creating and destroying a number of
 pools with and without redundancy, replacing vdevs, etc. Maybe asking
 by example will clarify what I'm looking for or where I've missed the
 boat. The key is that I want a grow-as-you-go heterogenous set of
 disks in my pool:

  Let's say I start with a 40g drive and a 60g drive. I create a
 non-redundant pool (which will be 100g). At some later point, I run
 across an unused 30g drive, which I add to the pool. Now my pool is
 130g. At some point after that, the 40g drive fails, either by
 producing read errors or my failing to spin up at all. What happens to
 my pool?

Since you have created a non-redundant pool (or more specifically, a
pool with non-redundant members), the pool will fail.

 The problem I've come across with using mirror or raidz for this setup
 is that (as far as I know) you can't add disks to mirror/raidz groups,
 and if you just add the disk to the pool, you end up in the same
 situation as above (with more space but no redundancy).

You can't add to an existing mirror, but you can add new mirrors (or
raidz) items to the pool.  If so, there's no loss of redundancy.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Neil Perrin



Darren Dunham wrote:

The problem I've come across with using mirror or raidz for this setup
is that (as far as I know) you can't add disks to mirror/raidz groups,
and if you just add the disk to the pool, you end up in the same
situation as above (with more space but no redundancy).


You can't add to an existing mirror, but you can add new mirrors (or
raidz) items to the pool.  If so, there's no loss of redundancy.


Maybe I'm missing some context, but you can add to an existing mirror
- see zpool attach.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Darren Dunham
 Darren Dunham wrote:
  The problem I've come across with using mirror or raidz for this setup
  is that (as far as I know) you can't add disks to mirror/raidz groups,
  and if you just add the disk to the pool, you end up in the same
  situation as above (with more space but no redundancy).
  
  You can't add to an existing mirror, but you can add new mirrors (or
  raidz) items to the pool.  If so, there's no loss of redundancy.
 
 Maybe I'm missing some context, but you can add to an existing mirror
 - see zpool attach.

It depends on what you mean by add.  :-) 

The original message was about increasing storage allocation.  You can
add redundancy to an existing mirror with attach, but you cannot
increase the allocatable storage.


-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Erik Trimble
On Wed, 2007-06-27 at 14:50 -0700, Darren Dunham wrote:
  Darren Dunham wrote:
   The problem I've come across with using mirror or raidz for this setup
   is that (as far as I know) you can't add disks to mirror/raidz groups,
   and if you just add the disk to the pool, you end up in the same
   situation as above (with more space but no redundancy).
   
   You can't add to an existing mirror, but you can add new mirrors (or
   raidz) items to the pool.  If so, there's no loss of redundancy.
  
  Maybe I'm missing some context, but you can add to an existing mirror
  - see zpool attach.
 
 It depends on what you mean by add.  :-) 
 
 The original message was about increasing storage allocation.  You can
 add redundancy to an existing mirror with attach, but you cannot
 increase the allocatable storage.
 

With mirrors, there is currently more flexibility than with raid-Z[2].
You can increase the allocatable storage size by replacing each disk in
the mirror with a larger sized one (assuming you wait for a
re-sync ;-P )

Thus, the _safe_ way to increase a mirrored vdev's size is:

Disk A:  100GB
Disk B:  100GB
Disk C:  250GB
Disk D:  250GB


zpool create tank mirror A B
(yank out A, put in C)
(wait for resync)
(yank out B, put in D)
(wait for resync)

and voila!  tank goes from 100GB to 250GB of space.

I believe this should also work if LUNs are used instead of actual disks
- but I don't believe that resizing a LUN currently in a mirror will
work (please, correct me on this), so, for a SAN-backed ZFS mirror, it
would be:

Assuming A = B  C, and after resizing A, A = C  B

zpool create tank mirror A B
zpool attach tank A C   (where C is a new LUN of the new size desired)
(wait for sync of C)
zpool detach tank A
(unmap LUN A from host, resize A to be the same as C, then map back)
zpool attach C A  
(wait for sync of A)
zpool detach B

I believe that will now result in a mirror of the full size of C, not of
B.

I'd be interested to know if you could do this:

zpool create tank mirror A B
(resize LUN A and B to new size)


without requiring a system reboot after resizing A  B  (that is, the
reboot would be needed to update the new LUN size on the host).


-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Richard Elling

Jef Pearlman wrote:
Perhaps I'm not asking my question clearly. I've already experimented a fair amount 
with zfs, including creating and destroying a number of pools with and without 
redundancy, replacing vdevs, etc. Maybe asking by example will clarify what I'm 
looking for or where I've missed the boat. The key is that I want a grow-as-you-go 
heterogenous set of disks in my pool:


The short answer:
zpool add -- add a top-level vdev as a dynamic stripe column
+ available space is increased

zpool attach -- add a mirror to an existing vdev
+ only works when the new mirror is the same size or larger than
  the existing vdev
+ available space is unchanged
+ redundancy (RAS) is increased

zpool detach -- remove a mirror from an existing vdev
+ available space increases if removed mirror is smaller than 
vdev
+ redundancy (RAS) is decreased

zpool replace -- functionally equivalent to attach followed by detach


Let's say I start with a 40g drive and a 60g drive. I create a non-redundant pool 
(which will be 100g). At some later point, I run across an unused 30g drive, which 
I add to the pool. Now my pool is 130g. At some point after that, the 40g drive 
fails, either by producing read errors or my failing to spin up at all. What happens 
to my pool? Can I mount and access it at all (for the data not on or striped across 
the 40g drive)? Can I zfs replace the 40g drive with another drive and have it 
attempt to copy as much data over as it can? Or am I just out of luck? zfs seems like 
a great way to use old/unutilized drives to expand capacity, but sooner or later one 
of those drives will fail, and if it takes out the whole pool (which it might 
reasonably do), then it doesn't work out in the end.


For non-redundant zpools, a device failure *may* cause the zpool to be 
unavailable.
The actual availability depends on the nature of the failure.

A more common scenario might be to add a 400 GByte drive, which you can use to
replace the older drives, or keep online for redundancy.

The zfs copies feature is a little bit harder to grok.  It is difficult to
predict how the system will be affected if you have copies=2 in your above
scenario, because it depends on how the space is allocated.  For more info,
see my notes at:
http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection

 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Erik Trimble
On Wed, 2007-06-27 at 12:03 -0700, Jef Pearlman wrote:
  Jef Pearlman wrote:
   Absent that, I was considering using zfs and just
   having a single pool. My main question is this: what
   is the failure mode of zfs if one of those drives
   either fails completely or has errors? Do I
   permanently lose access to the entire pool? Can I
   attempt to read other data? Can I zfs replace the
   bad drive and get some level of data recovery?
   Otherwise, by pooling drives am I simply increasing
   the probability of a catastrophic data loss? I
   apologize if this is addressed elsewhere -- I've read
   a bunch about zfs, but not come across this
   particular answer.
  
Pooling devices in a non-redundant mode (ie without a raidz or mirror
vdev) increases your chance of losing data, just like every other RAID
system out there.

However, since ZFS doesn't do concatenation (it stripes), by losing one
drive in a non-redundant stripe, you effectively corrupt the entire
dataset, as virtually all files should have some portion of their data
on the dead drive. 


  We generally recommend a single pool, as long as the
  use case permits.
  But I think you are confused about what a zpool is.
   I suggest you look
  t the examples or docs.  A good overview is the slide
  show
  http://www.opensolaris.org/os/community/zfs/docs/zfs_
  last.pdf
 
 Perhaps I'm not asking my question clearly. I've already experimented a fair 
 amount with zfs, including creating and destroying a number of pools with and 
 without redundancy, replacing vdevs, etc. Maybe asking by example will 
 clarify what I'm looking for or where I've missed the boat. The key is that I 
 want a grow-as-you-go heterogenous set of disks in my pool:
 
 Let's say I start with a 40g drive and a 60g drive. I create a non-redundant 
 pool (which will be 100g). At some later point, I run across an unused 30g 
 drive, which I add to the pool. Now my pool is 130g. At some point after 
 that, the 40g drive fails, either by producing read errors or my failing to 
 spin up at all. What happens to my pool? Can I mount and access it at all 
 (for the data not on or striped across the 40g drive)? Can I zfs replace 
 the 40g drive with another drive and have it attempt to copy as much data 
 over as it can? Or am I just out of luck? zfs seems like a great way to use 
 old/unutilized drives to expand capacity, but sooner or later one of those 
 drives will fail, and if it takes out the whole pool (which it might 
 reasonably do), then it doesn't work out in the end.
  

Nope. Your zpool is a stripe. As mentioned above, losing one disk in a
stripe effectively destroys all data, just as with any other RAID
system.


   As a side-question, does anyone have a suggestion
   for an intelligent way to approach this goal? This is
   not mission-critical data, but I'd prefer not to make
   data loss _more_ probable. Perhaps some volume
   manager (like LVM on linux) has appropriate features?
  
  ZFS, mirrored pool will be the most performant and
  easiest to manage
  with better RAS than a raidz pool.
 
 The problem I've come across with using mirror or raidz for this setup is 
 that (as far as I know) you can't add disks to mirror/raidz groups, and if 
 you just add the disk to the pool, you end up in the same situation as above 
 (with more space but no redundancy).
 
 Thanks for your help.
 
 -Jef
  
 

To answer the original question, you _have_ to create mirrors, which, if
you have odd-sized disks, will end up with unused space.

An example:

Disk A:   20GB
Disk B:   30GB
Disk C:   40GB
Disk D:   60GB


Start with disk A  B:

zpool create tank mirror A B

results in a 20GB pool.

Later, add disks C  D:

zpool add tank mirror C D

this results in a 2-wide stripe of 2 mirrors, which means there is a
total capacity of 60GB (20GB for A  B, 40GB for B  C) of the pool.
10GB of the 30GB drive, and 20GB of the 60GB drive are currently unused.
You can lose one drive from both pairs (i.e. A and C, A and D, B and C,
or B and D) before any data loss.


If you had known about the drive sizes beforehand, the you could have
done something like this:

Partition the drives as follows:

A:  1 20GB partition
B:  1 20gb  1 10GB partition
C:  1 40GB partition
D:  1 40GB partition  2 10GB paritions

then you do:

zpool create tank mirror Ap0 Bp0 mirror Cp0 Dp0 mirror Bp1 Dp1

and you get a total of 70GB of space. However, the performance on this
is going to be bad (as you frequently need to write to both partitions
on B  D, causing head seek), though you can still lose up to 2 drives
before experiencing data loss.


-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Drive Failure w/o Redundancy

2007-06-27 Thread Richard Elling

Erik Trimble wrote:

If you had known about the drive sizes beforehand, the you could have
done something like this:

Partition the drives as follows:

A:  1 20GB partition
B:  1 20gb  1 10GB partition
C:  1 40GB partition
D:  1 40GB partition  2 10GB paritions

then you do:

zpool create tank mirror Ap0 Bp0 mirror Cp0 Dp0 mirror Bp1 Dp1

and you get a total of 70GB of space. However, the performance on this
is going to be bad (as you frequently need to write to both partitions
on B  D, causing head seek), though you can still lose up to 2 drives
before experiencing data loss.


It is not clear to me that we can say performance will be bad
for stripes on single disks.  The reason is that ZFS dynamic
striping does not use a fixed interleave.  In other words, if
I write a block of N bytes to a M-way dynamic stripe, it is
not guaranteed that each device will get an I/O of N/M size.
I've only done a few measurements of this, and I've not completed
my analysis, but my data does not show the sort of thrashing one
might expect from a fixed stripe with small interleave.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss