Re: [zfs-discuss] ZFS needs a viable backup mechanism

2006-07-25 Thread Craig Morgan
Spare a thought also for the remote serviceability aspects of these  
systems, if customers raise calls/escalations against such systems  
then our remote support/solution centre staff would find such an  
output useful in identifying and verifying the config.


I'm don't have visibility of the Explorer development sites at the  
moment, but I believe that the last publicly available Explorer I  
looked at (v5.4) still didn't gather any ZFS related info, which  
would scare me mightily for a FS released in a production-grade  
Solaris 10 release ... how do we expect our support personnel to  
engage??


Craig

On 18 Jul 2006, at 00:53, Matthew Ahrens wrote:


On Fri, Jul 07, 2006 at 04:00:38PM -0400, Dale Ghent wrote:

Add an option to zpool(1M) to dump the pool config as well as the
configuration of the volumes within it to an XML file. This file
could then be sucked in to zpool at a later date to recreate/
replicate the pool and its volume structure in one fell swoop. After
that, Just Add Data(tm).


Yep, this has been on our to-do list for quite some time:

RFE #6276640 zpool config
RFE #6276912 zfs config

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Craig Morgan
Cinnabar Solutions Ltd

t: +44 (0)791 338 3190
f: +44 (0)870 705 1726
e: [EMAIL PROTECTED]
w: www.cinnabar-solutions.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS components for a minimal Solaris 10 U2 install?

2006-07-25 Thread Jim Connors


Included below is a a thread which dealt with trying to find the 
packages necessary for a minimal Solais 10 U2 install with ZFS 
functionality.  In addition to SUNWzfskr, SUNzfsr and SUNWzfsu the 
SUNWsmapi package needs to be installed.  The libdiskmgt.so.1 library is 
required for the zpool(1M) command.  Finding this out via trial and 
error, there is no dependency mentioned for SUNWsmapi in the SUNWzfsr 
depend file.


Apologies if this is nitpicking, but is this missing dependency worthy 
of submitting a P5 CR?


-- Jim C


Jason Schroeder wrote:

Dale Ghent wrote:


On Jun 28, 2006, at 4:27 PM, Jim Connors wrote:

For an embedded application, I'm looking at creating a minimal  
Solaris 10 U2 image which would include ZFS functionality.  In  
quickly taking a look at the opensolaris.org site under pkgdefs, I  
see three packages that appear to be related to ZFS: SUNWzfskr,  
SUNWzfsr, and SUNWzfsu.  Is it naive to think that this would be  
all that is needed for ZFS?



Those packages, as well as what's listed in the depend files for  
those packages.


Ahh, don't you love climbing the dependency tree?

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Glenn Brunette wrote a nifty little tool  ...  have to assume that all 
of the dependencies are appropriately doc'ed of course cough.


http://blogs.sun.com/roller/page/gbrunett?entry=solaris_package_companion

/jason


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS components for a minimal Solaris 10 U2 install?

2006-07-25 Thread Jason Schroeder
No arguement from me.  For better or for worse, most of the customers I 
speak with minimize their OS distributions.  The more we can accurately 
describe dependencies within our current methods, the better.



/jason

Jim Connors wrote:



Included below is a a thread which dealt with trying to find the 
packages necessary for a minimal Solais 10 U2 install with ZFS 
functionality.  In addition to SUNWzfskr, SUNzfsr and SUNWzfsu the 
SUNWsmapi package needs to be installed.  The libdiskmgt.so.1 library 
is required for the zpool(1M) command.  Finding this out via trial and 
error, there is no dependency mentioned for SUNWsmapi in the SUNWzfsr 
depend file.


Apologies if this is nitpicking, but is this missing dependency worthy 
of submitting a P5 CR?


-- Jim C


Jason Schroeder wrote:


Dale Ghent wrote:


On Jun 28, 2006, at 4:27 PM, Jim Connors wrote:

For an embedded application, I'm looking at creating a minimal  
Solaris 10 U2 image which would include ZFS functionality.  In  
quickly taking a look at the opensolaris.org site under pkgdefs, I  
see three packages that appear to be related to ZFS: SUNWzfskr,  
SUNWzfsr, and SUNWzfsu.  Is it naive to think that this would be  
all that is needed for ZFS?




Those packages, as well as what's listed in the depend files for  
those packages.


Ahh, don't you love climbing the dependency tree?

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Glenn Brunette wrote a nifty little tool  ...  have to assume that 
all of the dependencies are appropriately doc'ed of course cough.


http://blogs.sun.com/roller/page/gbrunett?entry=solaris_package_companion 



/jason



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS components for a minimal Solaris 10 U2 install?

2006-07-25 Thread Eric Schrock
On Tue, Jul 25, 2006 at 10:25:04AM -0400, Jim Connors wrote:
 
 Included below is a a thread which dealt with trying to find the 
 packages necessary for a minimal Solais 10 U2 install with ZFS 
 functionality.  In addition to SUNWzfskr, SUNzfsr and SUNWzfsu the 
 SUNWsmapi package needs to be installed.  The libdiskmgt.so.1 library is 
 required for the zpool(1M) command.  Finding this out via trial and 
 error, there is no dependency mentioned for SUNWsmapi in the SUNWzfsr 
 depend file.
 
 Apologies if this is nitpicking, but is this missing dependency worthy 
 of submitting a P5 CR?

Absolutely.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS needs a viable backup mechanism

2006-07-25 Thread Richard Elling

Craig Morgan wrote:
Spare a thought also for the remote serviceability aspects of these 
systems, if customers raise calls/escalations against such systems then 
our remote support/solution centre staff would find such an output 
useful in identifying and verifying the config.


I'm don't have visibility of the Explorer development sites at the 
moment, but I believe that the last publicly available Explorer I looked 
at (v5.4) still didn't gather any ZFS related info, which would scare me 
mightily for a FS released in a production-grade Solaris 10 release ... 
how do we expect our support personnel to engage??


Explorer *should* collect zfs get all and zpool status which will
give you all(?) of the file system parameters and pool/device configuration
information for first-level troubleshooting.  You might check with the
explorer developers and see when that is planned.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS state between reboots for RAM rsident OS?

2006-07-25 Thread Jim Connors

Guys,

Thanks for the help so far,  now comes the more interesting questions ...

Piggybacking off of some work being done to minimize Solaris for 
embedded use, I have a version of Solaris 10 U2 with ZFS functionality 
with a disk footprint of about 60MB.   Creating a miniroot based upon 
this image, it can be compressed to under 30MB.  Currently, I load this 
image onto a USB keyring and boot from the USB device running the 
Solaris miniroot out of RAM.  Note: The USB key ring is a hideously slow 
device, but for the sake of this proof of concept it works fine.  In 
addition, some more packages will need to be added later on (i.e. NFS, 
Samba?) which will increase the footprint.


My ultimate goal here would be to demonstrate a network storage 
appliance using ZFS, where the OS is effectively stateless, or as 
stateless as possible.  ZFS goes a long way in assisting here since, for 
example, mount and nfs share information can be managed by ZFS.  But I 
suppose it's not as stateless as I thought.  Upon booting from USB 
device into memory, I can do a `zpool create poo1 c1d0',  but a 
subsequent reboot does not remember this work.  Doing a `zpool list' 
yields 'no pools available'.  So the question is, what sort of state is 
required between reboots for ZFS?


Regards,
-- Jim C
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS state between reboots for RAM rsident OS?

2006-07-25 Thread Jim Connors


I understand.  Thanks.

Just curious, ZFS manages NFS shares.  Have you given any thought to 
what might be involved for ZFS to manage SMB shares in the same manner.  
This all goes towards my stateless OS theme.


-- Jim C


Eric Schrock wrote:

You need the following file:

/etc/zfs/zpool.cache

This file 'knows' about all the pools on the system.  These pools can
typically be discovered via 'zpool import', but we can't do this at boot
because:

a. It can be really, really expensive (tasting every disk on the system)
b. Pools can be comprised of files or devices not in /dev/dsk

So, we have the cache file, which must be editable if you want to
remember newly created pools.  Note this only affects configuration
changes to pools - everything else is stored within the pool itself.

- Eric

On Tue, Jul 25, 2006 at 12:18:07PM -0400, Jim Connors wrote:
  

Guys,

Thanks for the help so far,  now comes the more interesting questions ...

Piggybacking off of some work being done to minimize Solaris for 
embedded use, I have a version of Solaris 10 U2 with ZFS functionality 
with a disk footprint of about 60MB.   Creating a miniroot based upon 
this image, it can be compressed to under 30MB.  Currently, I load this 
image onto a USB keyring and boot from the USB device running the 
Solaris miniroot out of RAM.  Note: The USB key ring is a hideously slow 
device, but for the sake of this proof of concept it works fine.  In 
addition, some more packages will need to be added later on (i.e. NFS, 
Samba?) which will increase the footprint.


My ultimate goal here would be to demonstrate a network storage 
appliance using ZFS, where the OS is effectively stateless, or as 
stateless as possible.  ZFS goes a long way in assisting here since, for 
example, mount and nfs share information can be managed by ZFS.  But I 
suppose it's not as stateless as I thought.  Upon booting from USB 
device into memory, I can do a `zpool create poo1 c1d0',  but a 
subsequent reboot does not remember this work.  Doing a `zpool list' 
yields 'no pools available'.  So the question is, what sort of state is 
required between reboots for ZFS?


Regards,
-- Jim C



--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS state between reboots for RAM rsident OS?

2006-07-25 Thread Eric Schrock
On Tue, Jul 25, 2006 at 01:07:59PM -0400, Jim Connors wrote:
 
 I understand.  Thanks.
 
 Just curious, ZFS manages NFS shares.  Have you given any thought to 
 what might be involved for ZFS to manage SMB shares in the same manner.  
 This all goes towards my stateless OS theme.

Yep, this is in the works.  We have folks working on an integrted CIFS
stack, as well as a rewrite of the way shares are managed.  We named the
property 'sharenfs' to allow for future, non-NFS share mechanisms.  Once
the above work is nearing completion, we'll work on integrating closely
with the ZFS administration model.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ON build on Blade 1500 ATA disk extremely slow

2006-07-25 Thread Rainer Orth
I've recently started doing ON nightly builds on zfs filesystems on the
internal ATA disk of a Blade 1500 running snv_42.  Unfortunately, the
builds are extremely slow compared to building on an external IEEE 1394
disk attached to the same machine:

ATA disk:

 Elapsed build time (DEBUG) 

real 21:40:57.7
user  4:32:15.6
sys   8:22:24.1

IEEE 1394 disk:

 Elapsed build time (DEBUG) 

real  6:14:11.4
user  4:28:54.1
sys 36:04.1

Running kernel profile with lockstat (lockstat -kIW -D 20 sleep 300), I
find in the ATA case:

Profiling interrupt: 29117 events in 300.142 seconds (97 events/sec)

Count indv cuml rcnt nsec Hottest CPU+PILCaller  
---
15082  52%  52% 0.00 1492 cpu[0] (usermode)  
 9565  33%  85% 0.00  318 cpu[0] usec_delay  

compared to IEEE 1394:

Profiling interrupt: 29195 events in 300.969 seconds (97 events/sec)

Count indv cuml rcnt nsec Hottest CPU+PILCaller  
---
20042  69%  69% 0.00 2000 cpu[0] (usermode)  
 5414  19%  87% 0.00  317 cpu[0] usec_delay  

At other times, the kernel time can be even as high as 80%.  Unfortunately,
I've not been able to investigate how usec_delay is called since there's no
fbt provider for that function (nor for the alternative entry point
drv_usecwait found in uts/sun4[uv]/cpu/common_asm.s), so I'm a bit stuck
how to further investigate this.  I suspect that the dad(7D) driver is the
culprit, but it is only included in the closed tarball.  In the EDU S9
sources, I find that dcd_flush_cache() calls drv_usecwait(100), which
might be the cause of this.

How should I proceed to further investigate this, and can this be fixed
somehow?  This way, the machine is almost unusable as a build machine.

Rainer

-
Rainer Orth, Faculty of Technology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Quotas and Snapshots

2006-07-25 Thread Brad Plecs
I've run into this myself.  (I am in a university setting).  after reading bug 
ID 6431277 (URL below for noobs like myself who didn't know what see 6431277 
meant):

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6431277

...it's not clear to me how this will be resolved.  What I'd really like to see 
is
the ability to specify where the snapshot backing store will be.   (more 
specifically, the
ability for the snapshot space to *not* impact the filesystem space).  

We have a network Appliance box, whose snapshots are very popular for their 
value as online
backups.  Netapp charges snapshots to the storage pool, so they don't cost the 
filesystem anything.  I'm drooling over ZFS as an alternative to the expensive 
netapp hardware/software,
but since we sell RAID space but perform backups administratively, I can't have 
the snapshots
consuming people's space.   

I could increase the filesystem quota to accomodate the snapshots, but since 
the snapshot 
size is dynamic, I would have to increase it well beyond the current snapshot 
size.  Once I 
do that, users *will* fill the space (that they have not paid for).  

I could tune the size of the filesystem to match the snapshot + filesystem 
data, but since
snapshot size is dynamic, this is impractical.  

I also have some very small quotas (50 MB) for users, and would like to be able 
to 
create snapshots of them going back 30 days or so without it costing the user 
anything.
The snapshots save us tons of time and effort, but they're not worth it to the 
user to pay
for double or triple the space they're currently using, and I don't want the 
users going
over the original quota of 50 MB, so I can't make enough space in the 
filesystem 
to make snapshots of their data... it's maddening. 

If we must contain snapshots inside a filesystem, perhaps it's possible to set 
a distinct
quota for snapshot space vs. live data space?   I could then set snapshot 
quotas for 
my filesystems arbitrarily large for my administrative backups, or down to the 
filesystem size
or some other value if there has been delegated authority for the filesystem. 

It would also be nice to be able to make snapshots of parent filesystems that 
include their 
descendants.  Then, for example, I could create 

zfspool/grandparent/parent/child

...and set a filesystem quotas on parent, a user quota on child, and a 
snapshot quota on 
grandparent, and this solves most of my problems.  

in fact, I think a lot of ZFS's hierarchical features would be more valuable if 
parent filesystems
included their descendants (backups and NFS sharing, for example), but I'm sure
there are just as many arguments against that as for it.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ON build on Blade 1500 ATA disk extremely slow

2006-07-25 Thread Bill Sommerfeld
On Tue, 2006-07-25 at 13:45, Rainer Orth wrote:
 At other times, the kernel time can be even as high as 80%.  Unfortunately,
 I've not been able to investigate how usec_delay is called since there's no
 fbt provider for that function (nor for the alternative entry point
 drv_usecwait found in uts/sun4[uv]/cpu/common_asm.s), so I'm a bit stuck
 how to further investigate this.  I suspect that the dad(7D) driver is the
 culprit, but it is only included in the closed tarball.  In the EDU S9
 sources, I find that dcd_flush_cache() calls drv_usecwait(100), which
 might be the cause of this.

In the future, you can try:

# lockstat -s 10 -I sleep 10

which aggregates on the full stack trace, not just the caller, during
profiling interrupts.  (-s 10 sets the stack depth; tweak up or down to
taste).

 How should I proceed to further investigate this, and can this be fixed
 somehow?  This way, the machine is almost unusable as a build machine.

you've rediscovered 

6421427 netra x1 slagged by NFS over ZFS leading to long spins in the
ATA driver code

I've updated the bug to indicate that this wass seen on the Sun Blade
1500 as well.

- Bill





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS state between reboots for RAM rsident OS?

2006-07-25 Thread Jim Connors

Eric Schrock wrote:

You need the following file:

/etc/zfs/zpool.cache
  


So as a workaround (or more appropriately, a kludge) would it be 
possible to:


1. At boot time do a 'zpool import' of some pool guaranteed to exist.  
For the sake of this discussion call it 'system'


2. Have /etc/zfs/zpool.cache be symbolically linked to /system/ZPOOL.CACHE

-- Jim C

This file 'knows' about all the pools on the system.  These pools can
typically be discovered via 'zpool import', but we can't do this at boot
because:

a. It can be really, really expensive (tasting every disk on the system)
b. Pools can be comprised of files or devices not in /dev/dsk

So, we have the cache file, which must be editable if you want to
remember newly created pools.  Note this only affects configuration
changes to pools - everything else is stored within the pool itself.

- Eric

On Tue, Jul 25, 2006 at 12:18:07PM -0400, Jim Connors wrote:
  

Guys,

Thanks for the help so far,  now comes the more interesting questions ...

Piggybacking off of some work being done to minimize Solaris for 
embedded use, I have a version of Solaris 10 U2 with ZFS functionality 
with a disk footprint of about 60MB.   Creating a miniroot based upon 
this image, it can be compressed to under 30MB.  Currently, I load this 
image onto a USB keyring and boot from the USB device running the 
Solaris miniroot out of RAM.  Note: The USB key ring is a hideously slow 
device, but for the sake of this proof of concept it works fine.  In 
addition, some more packages will need to be added later on (i.e. NFS, 
Samba?) which will increase the footprint.


My ultimate goal here would be to demonstrate a network storage 
appliance using ZFS, where the OS is effectively stateless, or as 
stateless as possible.  ZFS goes a long way in assisting here since, for 
example, mount and nfs share information can be managed by ZFS.  But I 
suppose it's not as stateless as I thought.  Upon booting from USB 
device into memory, I can do a `zpool create poo1 c1d0',  but a 
subsequent reboot does not remember this work.  Doing a `zpool list' 
yields 'no pools available'.  So the question is, what sort of state is 
required between reboots for ZFS?


Regards,
-- Jim C



--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ON build on Blade 1500 ATA disk extremely slow

2006-07-25 Thread Rainer Orth
Bill,

 In the future, you can try:
 
 # lockstat -s 10 -I sleep 10
 
 which aggregates on the full stack trace, not just the caller, during
 profiling interrupts.  (-s 10 sets the stack depth; tweak up or down to
 taste).

nice.  Perhaps lockstat(1M) should be updated to include something like
this in the EXAMPLES section.

  How should I proceed to further investigate this, and can this be fixed
  somehow?  This way, the machine is almost unusable as a build machine.
 
 you've rediscovered 
 
 6421427 netra x1 slagged by NFS over ZFS leading to long spins in the
 ATA driver code
 
 I've updated the bug to indicate that this wass seen on the Sun Blade
 1500 as well.

Ok, thanks.  One important difference compared to that CR is that those
were local accesses to the FS, but the stack traces from lockstat are
identical.

Any word when this might be fixed?

Thanks.
Rainer

-
Rainer Orth, Faculty of Technology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Quotas and Snapshots

2006-07-25 Thread Al Hopper
On Tue, 25 Jul 2006, Brad Plecs wrote:

 I've run into this myself.  (I am in a university setting).  after reading 
 bug ID 6431277 (URL below for noobs like myself who didn't know what see 
 6431277 meant):

 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6431277

 ...it's not clear to me how this will be resolved.  What I'd really like to 
 see is
 the ability to specify where the snapshot backing store will be.   (more 
 specifically, the
 ability for the snapshot space to *not* impact the filesystem space).

 We have a network Appliance box, whose snapshots are very popular for their 
 value as online
 backups.  Netapp charges snapshots to the storage pool, so they don't cost 
 the filesystem anything.  I'm drooling over ZFS as an alternative to the 
 expensive netapp hardware/software,
 but since we sell RAID space but perform backups administratively, I can't 
 have the snapshots
 consuming people's space.

 I could increase the filesystem quota to accomodate the snapshots, but since 
 the snapshot
 size is dynamic, I would have to increase it well beyond the current snapshot 
 size.  Once I
 do that, users *will* fill the space (that they have not paid for).

 I could tune the size of the filesystem to match the snapshot + filesystem 
 data, but since
 snapshot size is dynamic, this is impractical.

 I also have some very small quotas (50 MB) for users, and would like to be 
 able to
 create snapshots of them going back 30 days or so without it costing the user 
 anything.
 The snapshots save us tons of time and effort, but they're not worth it to 
 the user to pay
 for double or triple the space they're currently using, and I don't want the 
 users going
 over the original quota of 50 MB, so I can't make enough space in the 
 filesystem
 to make snapshots of their data... it's maddening.

I'll play devils advocate here - because I don't see this as a ZFS related
issue or even one that is even in the ZFS domain - in terms of resolving
the issue at hand.

First, ZFS allows one to take advantage of large, inexpensive Serial ATA
disk drives.  Paraphrased: ZFS loves large, cheap SATA disk drives.  So
the first part of the solution looks (to me) as simple as adding some
cheap SATA disk drives.

Next, after extra storage space has been added to the pool, it's a simple
matter of accounting to subtract the size of the ZFS snapshots from the
users' disk space to calculate their actual live storage and bill for it!

Next, periodic snapshots can be made and older snapshots either deleted or
moved to even lower cost storage media (e.g., tape, CDROMs, DVDs etc).

Next - 50Mb quotas!??  You've got to be kidding.  Let me check my
calendar; yep - it's 2006.  You are kidding ... right?  If you're not
kidding, then you've got a business/management issue and not a technical
issue to resolve.

 If we must contain snapshots inside a filesystem, perhaps it's possible to 
 set a distinct
 quota for snapshot space vs. live data space?   I could then set snapshot 
 quotas for
 my filesystems arbitrarily large for my administrative backups, or down to 
 the filesystem size
 or some other value if there has been delegated authority for the filesystem.

 It would also be nice to be able to make snapshots of parent filesystems that 
 include their
 descendants.  Then, for example, I could create

 zfspool/grandparent/parent/child

 ...and set a filesystem quotas on parent, a user quota on child, and a 
 snapshot quota on
 grandparent, and this solves most of my problems.

 in fact, I think a lot of ZFS's hierarchical features would be more valuable 
 if parent filesystems
 included their descendants (backups and NFS sharing, for example), but I'm 
 sure
 there are just as many arguments against that as for it.


Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] somewhat OT: inexpensive 64-bit CPUs for ZFS

2006-07-25 Thread Al Hopper

A couple of weeks ago, there was a discussion on the best system for ZFS
and I mentioned that AMD would reduce pricing and withdraw some of the
939-pin (non AM2) processors from the marketplace.

Update: I see a dual-core AMD X2 4400+ (1Mb cache per core) processor on
www.monarchcomputers.com for ~ $255.  And there's the X2 4600+, with 512kb
cache per core for around the same price.

http://www.monarchcomputer.com/Merchant2/merchant.mv?Screen=PRODStore_Code=MProduct_Code=120241Category_Code=amddc

They won't last long

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Quotas and Snapshots

2006-07-25 Thread Darren Dunham
 First, ZFS allows one to take advantage of large, inexpensive Serial ATA
 disk drives.  Paraphrased: ZFS loves large, cheap SATA disk drives.  So
 the first part of the solution looks (to me) as simple as adding some
 cheap SATA disk drives.

I hope not.  We have quotas available for a reason.  There are
legitimate reasons for putting an administrative upper bound on storage
space.  It's not always about disk acquisition costs.

The ability to have the user own their space and the administrator own
the snapshot would look good to me.  If the user owns both, I expect
them to trade snapshots for space.  I would prefer to be able to
guarantee snapshots as part of a defined recovery system without
limiting access to their quota.

 Next, after extra storage space has been added to the pool, it's a simple
 matter of accounting to subtract the size of the ZFS snapshots from the
 users' disk space to calculate their actual live storage and bill for it!
 
 Next, periodic snapshots can be made and older snapshots either deleted or
 moved to even lower cost storage media (e.g., tape, CDROMs, DVDs etc).

I don't care how much bigger you make it.  At some point the space will
be used.  Then the user deletes a subtree and doesn't get any of the space
back because of snapshots  and I don't want the snapshots deleted.

 Next - 50Mb quotas!??  You've got to be kidding.  Let me check my
 calendar; yep - it's 2006.  You are kidding ... right?  If you're not
 kidding, then you've got a business/management issue and not a technical
 issue to resolve.

So pretend it's 500G.  The suggestions still seem very valid to me.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Quotas and Snapshots

2006-07-25 Thread Brad Plecs
 First, ZFS allows one to take advantage of large, inexpensive Serial ATA
 disk drives.  Paraphrased: ZFS loves large, cheap SATA disk drives.  So
 the first part of the solution looks (to me) as simple as adding some
 cheap SATA disk drives.
 
 Next, after extra storage space has been added to the pool, it's a simple
 matter of accounting to subtract the size of the ZFS snapshots from the
 users' disk space to calculate their actual live storage and bill for it!
 
 Next, periodic snapshots can be made and older snapshots either deleted or
 moved to even lower cost storage media (e.g., tape, CDROMs, DVDs etc).
 
 Next - 50Mb quotas!??  You've got to be kidding.  Let me check my
 calendar; yep - it's 2006.  You are kidding ... right?  If you're not
 kidding, then you've got a business/management issue and not a technical
 issue to resolve.

Our problem isn't we don't have enough storage space.   Our problem is
that the snapshots reduce the filesystem space available to users.   Simply
creating the snapshot begins squeezing them out of their own space. 

Just billing people for it doesn't solve the problem. 
Simply giving people more space doesn't solve the problem, either. 

I did think of another option -- allow filesystems to dynamically grow
their quotas themselves by the size of the snapshots.  This might be
easiest to implement -- just add a toggle in the zfs set/get
parameters.  If it's integrated into the filesystem itself, keeping up
with the changing size of the data should be a non-issue.

Actually, I guess that's fairly similar to the separate quota for 
snapshot data idea. 

BP 

-- 
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Proposal: delegated administration

2006-07-25 Thread Mark Shellenbaum


I would like to make a couple of additions to the proposed model.



Permission Sets.

Allow the administrator to define a named set of permissions, and then 
use the name as a permission later on.  Permission sets would be 
evaluated dynamically, so that changing the set definition would change 
what is allowed everywhere the set is used.


Permission sets would need a special character to differentiate it from 
a normal permission.  I would like to recommend we use the '@' character 
for this.


The -s option will be used to manipulate a named set.

# zfs allow -s @setname perm,perm... dataset
# zfs unallow -s @setname perm,perms... dataset
# zfs unallow -s @setname dataset

Set Examples:

First we need to define the set (@myset)
# zfs allow -s @myset create,destroy,snapshot,clone datapool

Now let group staff use the named set (@myset).
# zfs allow staff @myset datapool

You could also mix a named set with a normal permission list
# zfs allow fred @myset,mount,promote datapool/fred

When a permission set is evaluated the nearest ancestor that defines the
named set would be used.



Permission printing.

With permission sets displaying the various permissions becomes a little 
messy.  I would like to propose the following format.  Its a bit 
verbose, but it is readable.


--
Permission sets on (pool/fred)
@set1 create,destroy,snapshot,mount,clone,promote,rename
@simple create,mount
Create time permissions on (pool/fred)
@set1,mountpoint
Local permissions on (pool/fred)
user tom @set1
user joe create,destroy,mount
Local+Descendent permissions on (pool/fred)
user fred @basic,share,rename
Descendent permissions on (pool/fred)
user barney @basic
group staff @basic
--
Permission sets on (pool)
@simple create,destroy,mount
Local permissions on (pool)
group staff @simple
--




Mark Shellenbaum wrote:
The following is the delegated admin model that Matt and I have been 
working on.  At this point we are ready for your feedback on the 
proposed model.


  -Mark





PERMISSION GRANTING

zfs allow [-l] [-d] everyone|user|group ability[,ability...] \
dataset
zfs allow [-l] [-d] -u user ability[,ability...] dataset
zfs allow [-l] [-d] -g group ability[,ability...] dataset
zfs allow [-l] [-d] -e ability[,ability...] dataset
zfs allow -c ability[,ability...] dataset

If no flags are used, the ability will be allowed for the specified
dataset and all of its descendents.

-l Local means that the permission will be allowed for the
specified dataset, and not its descendents (unless -d is also
specified).

-d Descendents means that the permission will be allowed for
descendent datasets, and not for this dataset (unless -l is also
specified).  (needed for 'zfs allow -d ahrens quota tank/home/ahrens')

When using the first form (without -u, -g, or -e), the
everyone|user|group argument will be interpreted as the keyword
everyone if possible, then as a user if possible, then as a group as
possible.  The -u user, -g group, and -e (everyone) forms
allow one to specify a user named everyone, or a group whose name
conflicts with a user (or everyone).  (note: the -e form is not
necessary since zfs allow everyone will always mean the keyword
everyone not the user everyone.)

As a possible extension, multiple who's could be allowed in one
command (eg. 'zfs allow -u ahrens,marks create tank/project')

-c Create means that the permission will be granted (Locally) to the
creator on any newly-created descendant filesystems.

Abilities are mostly self explanatory, the ability to run
'zfs [set] ability ds'.  Note, this implicitly collapses the
subcommand and property namespaces into one.  (I think that the 'set' is
superfluous anyway, it would be more convenient to say
'zfs property=value' anyway.)

create  create descendent datasets
destroy
snapshot
rollback
clone   create clone of any of the ds's snaps
(must also have 'create' ability in clone's parent)
promote (must also have 'promote' ability in origin fs)
rename  (must also have 'create' ability in new parent)
mount   mount and unmount the ds
share   share and unshare this ds
sendsend any of the ds's snapshots
receive create a descendent with 'zfs receive'
(must also have 'create' ability)
quota
reservation
volsize 
recordsize
mountpoint
sharenfs
checksum
compression
atime
devices
  

Re: [zfs-discuss] ON build on Blade 1500 ATA disk extremely slow

2006-07-25 Thread Bill Sommerfeld
On Tue, 2006-07-25 at 14:36, Rainer Orth wrote:
  Perhaps lockstat(1M) should be updated to include something like
 this in the EXAMPLES section.

I filed 6452661 with this suggestion.

 Any word when this might be fixed?

I can't comment in terms of time, but the engineer working on it has a
partially tested fix; he needs to complete testing and integrate the
fix..  not clear how long this will take. 

- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ON build on Blade 1500 ATA disk extremely slow

2006-07-25 Thread Rainer Orth
Bill,

 On Tue, 2006-07-25 at 14:36, Rainer Orth wrote:
   Perhaps lockstat(1M) should be updated to include something like
  this in the EXAMPLES section.
 
 I filed 6452661 with this suggestion.

excellent, thanks.

  Any word when this might be fixed?
 
 I can't comment in terms of time, but the engineer working on it has a
 partially tested fix; he needs to complete testing and integrate the
 fix..  not clear how long this will take. 

No problem: I can use that IEEE 1394 disk for now.  Good to know that this
is being worked on, though.

Rainer

-
Rainer Orth, Faculty of Technology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Quotas and Snapshots

2006-07-25 Thread Matthew Ahrens
On Tue, Jul 25, 2006 at 11:13:16AM -0700, Brad Plecs wrote:
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6431277
 
 What I'd really like to see is ... the ability for the snapshot space
 to *not* impact the filesystem space).  

Yep, as Eric mentioned, that is the purpose of this RFE (want
filesystem-only quotas).

I imagine that this would be implemented as a quota against the space
referenced (as currently reported by 'zfs list', 'zfs get refer',
'df', etc; see the zfs(1m) manpage for details).

 in fact, I think a lot of ZFS's hierarchical features would be more
 valuable if parent filesystems included their descendants (backups and
 NFS sharing, for example), but I'm sure there are just as many
 arguments against that as for it.

Yep, we're working on making more features work on this and all
descendents.  For example, the recently implemented 'zfs snapshot -r'
can create snapshots of a filesystem and all its descendents.  This
feature will be part of Solaris 10 update 3.  We're also working on 'zfs
send -r' (RFE 6421958).

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to best layout our filesystems

2006-07-25 Thread Karen Chau
Our application Canary has approx 750 clients uploading to the server
every 10 mins, that's approx 108,000 gzip tarballs per day writing to
the /upload directory.  The parser untars the tarball which consists of
8 ascii files into the /archives directory.  /app is our application and
tools (apache, tomcat, etc) directory.  We also have batch jobs that run
throughout the day, I would say we read 2 to 3 times more than we write.

Since we have an alternate server, downtime or data lost is somewhat
acceptable.  How can we best layout our filesystems to get the most
performance.

directory info
--
/app  - 30G
/upload   - 10G
/archives - 35G

HW info
---
System Configuration:  Sun Microsystems  sun4v Sun Fire T200
System clock frequency: 200 MHz
Memory size: 8184 Megabytes
CPU: 32 x 1000 MHz  SUNW,UltraSPARC-T1
Disks: 4x68G
  Vendor:   FUJITSU
  Product:  MAV2073RCSUN72G
  Revision: 0301


We plan on using 1 disk for OS, the others 3 disks for canary
filesystems, /app, /upload, and /archives.  Should I create 3 pools, ie
   zpool create canary_app c1t1d0
   zpool create canary_upload c1t2d0
   zpool create canary_archives c1t3d0

--OR--
create 1 pool using dynamic stripe, ie
   zpool create canary c1t1d0 c1t2d0 c1t3d0

--OR--
create a single-parity raid-z pool, ie.
   zpool create canary raidz c1t1d0 c1t2d0 c1t3d0

Which option gives us the best performance?  If there's another method
that's not mentioned, please let me know.

Also, should be enable read/write cache on the OS as well as the other
disks?

Is build 9 in S10U2 RR??  If not, please point me to the OS image on
nana.eng.


Thanks,
karen


-- 

NOTICE:  This email message is for the sole use of the intended recipient(s)
and may contain confidential and privileged information.  Any unauthorized
review, use, disclosure or distribution is prohibited.  If you are not the
intended recipient, please contact the sender by reply email and destroy all
copies of the original message.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to best layout our filesystems

2006-07-25 Thread Torrey McMahon
Given the amount of I/O wouldn't it make sense to get more drives 
involved or something that has cache on the front end or both? If you're 
really pushing the amount of I/O you're alluding too - Hard to tell 
without all the details - then you're probably going to hit a limitation 
on the drive IOPS. (Even with the cache on.)


Karen Chau wrote:

Our application Canary has approx 750 clients uploading to the server
every 10 mins, that's approx 108,000 gzip tarballs per day writing to
the /upload directory.  The parser untars the tarball which consists of
8 ascii files into the /archives directory.  /app is our application and
tools (apache, tomcat, etc) directory.  We also have batch jobs that run
throughout the day, I would say we read 2 to 3 times more than we write.

  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Quotas and Snapshots

2006-07-25 Thread Matthew Ahrens
On Tue, Jul 25, 2006 at 07:24:51PM -0500, Mike Gerdts wrote:
 On 7/25/06, Brad Plecs [EMAIL PROTECTED] wrote:
 What I'd really like to see is ... the ability for the snapshot space
 to *not* impact the filesystem space).
 
 The idea is that you have two storage pools - one for live data, one
 for backup data.  Your live data is *probably* on faster disks than
 your backup data.  The live data and backup data may or may not be on
 the same server.  Whenever you need to perform backups you do
 something along the lines of:
 
 yesterday=$1
 today=$2
 for user in $allusers ; do
zfs snapshot users/[EMAIL PROTECTED]
zfs snapshot backup/$user/[EMAIL PROTECTED]
zfs clone backup/$user/[EMAIL PROTECTED] backup/$user/$today
rsync -axuv /users/$user/.zfs/snapshot/$today /backup/$user/$today
zfs destroy users/[EMAIL PROTECTED]
zfs destroy backup/$user/$lastweek
 done

You can simplify and improve the performance of this considerably by
using 'zfs send':

for user in $allusers ; do
zfs snapshot users/[EMAIL PROTECTED]
zfs send -i $yesterday users/[EMAIL PROTECTED] | \
ssh $host zfs recv -d $backpath
ssh $host zfs destroy $backpath/$user/$lastweek
done

You can send the backup to the same or different host, and the same or
different pool, as your hardware needs dictate.  'zfs send' will be much
faster than rsync because we can use ZFS metadata to determine which
blocks were changed without traversing all files  directories.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Quotas and Snapshots

2006-07-25 Thread Mike Gerdts

On 7/25/06, Matthew Ahrens [EMAIL PROTECTED] wrote:

You can simplify and improve the performance of this considerably by
using 'zfs send':

for user in $allusers ; do
zfs snapshot users/[EMAIL PROTECTED]
zfs send -i $yesterday users/[EMAIL PROTECTED] | \
ssh $host zfs recv -d $backpath
ssh $host zfs destroy $backpath/$user/$lastweek
done

You can send the backup to the same or different host, and the same or
different pool, as your hardware needs dictate.  'zfs send' will be much
faster than rsync because we can use ZFS metadata to determine which
blocks were changed without traversing all files  directories.



This is what I had originally intended to say, but it seems with this
approach the yesterday snapshot has to stick around in order to do
incrementals.  The fact that snapshots counted against current quota
was part of the problem statement.  My approach with rsync avoids this
but, as I said before, is an ugly hack because it doesn't use the
features of zfs.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to best layout our filesystems

2006-07-25 Thread Sean Meighan




Hi Torrey; we are the cobblers kids. We borrowed this T2000 from
Niagara engineering after we did some performance tests for them. I am
trying to get a thumper to run this data set. This could take up to 3-4
months. Today we are watching 750 Sun Ray servers and 30,000 employees.
Lets see
1) Solaris 10
2) ZFS version 6
3) T2000 32x1000 with the poorer performing drives that come with the
Niagara

We need a short term solution. Niagara engineering has given us two
more of the internal drives so we can max out the Niagara with 4
internal drives. This is the hardware we need to use this week. . When
we get a new box, more drives we will reconfigure.

Our graphs have 5000 data points per month, 140 data points per day. we
can stand to lose data.

my suggestion was one drive as the system volume and the remaining
three drives as one big zfs volume , probably raidz.

thanks
sean


Torrey McMahon wrote:
Given the
amount of I/O wouldn't it make sense to get more drives involved or
something that has cache on the front end or both? If you're really
pushing the amount of I/O you're alluding too - Hard to tell without
all the details - then you're probably going to hit a limitation on the
drive IOPS. (Even with the cache on.)
  
  
Karen Chau wrote:
  
  Our application Canary has approx 750 clients
uploading to the server

every 10 mins, that's approx 108,000 gzip tarballs per day writing to

the /upload directory. The parser untars the tarball which consists of

8 ascii files into the /archives directory. /app is our application
and

tools (apache, tomcat, etc) directory. We also have batch jobs that
run

throughout the day, I would say we read 2 to 3 times more than we
write.


 
  


-- 

  

  
   Sean Meighan 
Mgr ITSM Engineering
  
  Sun Microsystems, Inc.
US
Phone x32329 / +1 408 850-9537
Mobile 303-520-2024
Fax 408 850-9537
Email [EMAIL PROTECTED]
  
  

  


NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss