On Wed, 2007-06-27 at 12:03 -0700, Jef Pearlman wrote:
> > Jef Pearlman wrote:
> > > Absent that, I was considering using zfs and just
> > > having a single pool. My main question is this: what
> > > is the failure mode of zfs if one of those drives
> > > either fails completely or has errors? Do I
> > > permanently lose access to the entire pool? Can I
> > > attempt to read other data? Can I "zfs replace" the
> > > bad drive and get some level of data recovery?
> > > Otherwise, by pooling drives am I simply increasing
> > > the probability of a catastrophic data loss? I
> > > apologize if this is addressed elsewhere -- I've read
> > > a bunch about zfs, but not come across this
> > > particular answer.
> > 
Pooling devices in a non-redundant mode (ie without a raidz or mirror
vdev) increases your chance of losing data, just like every other RAID
system out there.

However, since ZFS doesn't do concatenation (it stripes), by losing one
drive in a non-redundant stripe, you effectively corrupt the entire
dataset, as virtually all files should have some portion of their data
on the dead drive. 


> > We generally recommend a single pool, as long as the
> > use case permits.
> > But I think you are confused about what a zpool is.
> >  I suggest you look
> > t the examples or docs.  A good overview is the slide
> > show
> >     http://www.opensolaris.org/os/community/zfs/docs/zfs_
> > last.pdf
> 
> Perhaps I'm not asking my question clearly. I've already experimented a fair 
> amount with zfs, including creating and destroying a number of pools with and 
> without redundancy, replacing vdevs, etc. Maybe asking by example will 
> clarify what I'm looking for or where I've missed the boat. The key is that I 
> want a grow-as-you-go heterogenous set of disks in my pool:
> 
> Let's say I start with a 40g drive and a 60g drive. I create a non-redundant 
> pool (which will be 100g). At some later point, I run across an unused 30g 
> drive, which I add to the pool. Now my pool is 130g. At some point after 
> that, the 40g drive fails, either by producing read errors or my failing to 
> spin up at all. What happens to my pool? Can I mount and access it at all 
> (for the data not on or striped across the 40g drive)? Can I "zfs replace" 
> the 40g drive with another drive and have it attempt to copy as much data 
> over as it can? Or am I just out of luck? zfs seems like a great way to use 
> old/unutilized drives to expand capacity, but sooner or later one of those 
> drives will fail, and if it takes out the whole pool (which it might 
> reasonably do), then it doesn't work out in the end.
>  

Nope. Your zpool is a stripe. As mentioned above, losing one disk in a
stripe effectively destroys all data, just as with any other RAID
system.


> > > As a side-question, does anyone have a suggestion
> > > for an intelligent way to approach this goal? This is
> > > not mission-critical data, but I'd prefer not to make
> > > data loss _more_ probable. Perhaps some volume
> > > manager (like LVM on linux) has appropriate features?
> > 
> > ZFS, mirrored pool will be the most performant and
> > easiest to manage
> > with better RAS than a raidz pool.
> 
> The problem I've come across with using mirror or raidz for this setup is 
> that (as far as I know) you can't add disks to mirror/raidz groups, and if 
> you just add the disk to the pool, you end up in the same situation as above 
> (with more space but no redundancy).
> 
> Thanks for your help.
> 
> -Jef
>  
> 

To answer the original question, you _have_ to create mirrors, which, if
you have odd-sized disks, will end up with unused space.

An example:

Disk A:   20GB
Disk B:   30GB
Disk C:   40GB
Disk D:   60GB


Start with disk A & B:

zpool create tank mirror A B

results in a 20GB pool.

Later, add disks C & D:

zpool add tank mirror C D

this results in a 2-wide stripe of 2 mirrors, which means there is a
total capacity of 60GB (20GB for A & B, 40GB for B & C) of the pool.
10GB of the 30GB drive, and 20GB of the 60GB drive are currently unused.
You can lose one drive from both pairs (i.e. A and C, A and D, B and C,
or B and D) before any data loss.


If you had known about the drive sizes beforehand, the you could have
done something like this:

Partition the drives as follows:

A:  1 20GB partition
B:  1 20gb & 1 10GB partition
C:  1 40GB partition
D:  1 40GB partition & 2 10GB paritions

then you do:

zpool create tank mirror Ap0 Bp0 mirror Cp0 Dp0 mirror Bp1 Dp1

and you get a total of 70GB of space. However, the performance on this
is going to be bad (as you frequently need to write to both partitions
on B & D, causing head seek), though you can still lose up to 2 drives
before experiencing data loss.


-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to