Re: [zfs-discuss] Thumper and ZFS

2006-10-13 Thread Richard Elling - PAE

Do you want data availability, data retention, space, or performance?
 -- richard

Robert Milkowski wrote:

Hello zfs-discuss,

  While waiting for Thumpers to come I'm thinking how to configure
  them. I would like to use raid-z. As thumper has 6 SATA controllers
  each 8-port then maybe it would make sense to create raid-z groups
  from 6 disks each from separate controller. Then combine 7 such
  groups into one pool. Then there're 6 disks remaining with two of
  them designated for system (mirror) which leaves 4 disks probably
  as hot-spares.

  That way if one controller fails entire pool will still be ok.

  What do you think?

  ps. there still will be SPOF for boot disks and hot spares but it
  looks like there's no choice anyway.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] fsflush and zfs

2006-10-13 Thread ttoulliu2002
Is there any change regarding fsflush such as autoup tunable for zfs ?

Thanks
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS ACLs and Samba

2006-10-13 Thread Jiri Sasek
ZFS/NFSv4 introduced a new acl model (see acl(2) ...nevada (OpenSolaris) 
Solaris10u2). There is no compatibility bridge between the 
GETACL/SETACL/GETACLCNT  and ACE_SETACL/ACE_SETACL/ACE_GETACLCNT fonctions of 
acl(2) syscall. Because this is Solaris speciffic (samba.org defines its 
internal acl handling based on posix acls) problem Sun is working on support of 
the Samba on ZFS/NFSv4 volumes.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-13 Thread Joerg Schilling
Nicolas Williams [EMAIL PROTECTED] wrote:

 On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling wrote:
  Before we start defining the first offocial functionality for this Sun 
  feature, 
  we should define a mapping for Mac OS, FreeBSD and Linux. It may make 
  sense, to 
  define a sub directory for the attribute directory for keeping old versions
  of a file.

 Definitely a sub-directory would be needed yes, and I don't agree to the
 first part.

Why not?

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

2006-10-13 Thread Joerg Schilling
Spencer Shepler [EMAIL PROTECTED] wrote:

 I didn't comment on the error conditions that can occur during
 the writing of data upon close().  What you describe is the preferred
 method of obtaining any errors that occur during the writing of data.
 This occurs because the NFS client is writing asynchronously and the
 only method the application has of retrieving the error information
 is from the fsync() or close() call.  At close(), it is to late
 to recovery so fsync() can be used to obtain any asynchronous error
 state.

 This doesn't change the fact that upon close() the NFS client will
 write data back to the server.  This is done to meet the
 close-to-open semantics of NFS.

Your working did not match with the reality, this is why I did write this.
You did write that upon close() the client will first do something similar to 
fsync on that file. The problem is that this is done asynchronously and the
close() return value does noo contain an indication on whether the fsync
did succeed.


  It would also make it harder to implement error control as it may be that 
  a problem is detected late while another large file is being extracted.
  Star could not just quit with an error message but would need to delay the
  error caused exit.

 Sure, I can see that it would be difficult.  My point is that tar is
 not only waiting upon the fsync()/close() but also on file and directory
 creation.  There is a longer delay not only because of the network
 latency but also the latency to writing the filesystem data to
 stable storage.  Parallel requests will tend to overcome the delay/bandwidth
 issues.  Not easy but can be an advantage with respect to performance.

I see no simple way to let tar implement concurrenty with respect to these 
problems. In star, it would be possible to create detached threads that
work independently on small files that in sum are smaller than the size of the 
FIFO. This would however make the code much more complex.


Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Thumper and ZFS

2006-10-13 Thread Robert Milkowski
Hello Richard,

Friday, October 13, 2006, 8:05:18 AM, you wrote:

REP Do you want data availability, data retention, space, or performance?

data availability, space, performance

However we're talking about quite a lot of small IOs (r+w).

The real question was what do you think about creating each raid group
only from disks from different controllers so controller failure won't
affect data availability.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs and zones

2006-10-13 Thread Roshan Perera
Hi,

Sorry if this has been raised before.

Question: IS it possible to 

1. Solaris 10 OS partitons to be SDS and have a single partition on that same 
disk (without SDS) to be ZFS slice.
2. Partition the zfs slice for many partitions and each partition to hold a 
zone. Idea is to create many non-global zones and each zone to be in a zfs 
partition.
3. Also, at a later date to increase the zfs partitions used for zones as and 
when required.

Am I dreaming :-)

Thanks

Roshan


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and zones

2006-10-13 Thread Robert Milkowski
Hello Roshan,

Friday, October 13, 2006, 1:12:12 PM, you wrote:

RP Hi,

RP Sorry if this has been raised before.

RP Question: IS it possible to 

RP 1. Solaris 10 OS partitons to be SDS and have a single partition
RP on that same disk (without SDS) to be ZFS slice.

Yes.

RP 2. Partition the zfs slice for many partitions and each partition
RP to hold a zone. Idea is to create many non-global zones and each zone to be 
in a zfs partition.

Yes. (I guess you want to have separate zfs file systetem for each
zone)

RP 3. Also, at a later date to increase the zfs partitions used for zones as 
and when required.

Yes.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and zones

2006-10-13 Thread Jeff Victor

Roshan Perera wrote:

Hi,

Sorry if this has been raised before.

Question: IS it possible to

1. Solaris 10 OS partitons to be SDS and have a single partition on that same
disk (without SDS) to be ZFS slice. 


Yes.


2. Partition the zfs slice for many
partitions and each partition to hold a zone. Idea is to create many non-global
zones and each zone to be in a zfs partition. 


I am not aware of the word partition in ZFS parlance, but I think I know what 
you mean, so I will attempt to answer with my interpretation:


You can use a disk slice as a device in a ZFS pool.  In that pool you can create 
one or more ZFS filesystems.


A zone's root directory could be installed in a ZFS filesystem, but this is not 
yet recommended, nor is it supported, because it is not yet possible to apply a 
Solaris update to a system configured like that.  This will be fixed.


If you don't care about that limitation, you can put one or more zones in a ZFS 
fs.  The best method seems to be one zone per ZFS fs.  I think that's what you 
were asking about.  That model allows you to put a disk quota on a zone.


You can accomplish that same goal with SDS (now called SVM) and soft partitions, 
but you wouldn't get all of the ZFS magic. :-)




3. Also, at a later date to
increase the zfs partitions used for zones as and when required.


Yes.



--
Jeff VICTOR  Sun Microsystemsjeff.victor @ sun.com
OS AmbassadorSr. Technical Specialist
Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

2006-10-13 Thread Spencer Shepler
On Fri, Joerg Schilling wrote:
 Spencer Shepler [EMAIL PROTECTED] wrote:
 
  I didn't comment on the error conditions that can occur during
  the writing of data upon close().  What you describe is the preferred
  method of obtaining any errors that occur during the writing of data.
  This occurs because the NFS client is writing asynchronously and the
  only method the application has of retrieving the error information
  is from the fsync() or close() call.  At close(), it is to late
  to recovery so fsync() can be used to obtain any asynchronous error
  state.
 
  This doesn't change the fact that upon close() the NFS client will
  write data back to the server.  This is done to meet the
  close-to-open semantics of NFS.
 
 Your working did not match with the reality, this is why I did write this.
 You did write that upon close() the client will first do something similar to 
 fsync on that file. The problem is that this is done asynchronously and the
 close() return value does noo contain an indication on whether the fsync
 did succeed.

Sorry, the code in Solaris would behave as I described.  Upon the 
application closing the file, modified data is written to the server.
The client waits for completion of those writes.  If there is an error,
it is returned to the caller of close().

Spencer

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and zones

2006-10-13 Thread Roshan Perera
Hi Jeff  Robert,
Thanks for the reply. Your interpretation is correct and the answer spot on.

This is going to be at a VIP clients QA/production environment and first 
introduction to 10, zones and zfs. Anything unsupported is not allowed. Hence I 
may have to wait for the fix. Do you know roughly when the fixes will be 
available. So that I can give the cusrtomer some time related info.
Thanks again.
Roshan


- Original Message -
From: Jeff Victor [EMAIL PROTECTED]
Date: Friday, October 13, 2006 2:56 pm
Subject: Re: [zfs-discuss] zfs and zones
To: Roshan Perera [EMAIL PROTECTED]
Cc: zfs-discuss@opensolaris.org

 Roshan Perera wrote:
  Hi,
  
  Sorry if this has been raised before.
  
  Question: IS it possible to
  
  1. Solaris 10 OS partitons to be SDS and have a single partition 
 on that same
  disk (without SDS) to be ZFS slice. 
 
 Yes.
 
  2. Partition the zfs slice for many
  partitions and each partition to hold a zone. Idea is to create 
 many non-global
  zones and each zone to be in a zfs partition. 
 
 I am not aware of the word partition in ZFS parlance, but I 
 think I know what 
 you mean, so I will attempt to answer with my interpretation:
 
 You can use a disk slice as a device in a ZFS pool.  In that pool 
 you can create 
 one or more ZFS filesystems.
 
 A zone's root directory could be installed in a ZFS filesystem, 
 but this is not 
 yet recommended, nor is it supported, because it is not yet 
 possible to apply a 
 Solaris update to a system configured like that.  This will be fixed.
 
 If you don't care about that limitation, you can put one or more 
 zones in a ZFS 
 fs.  The best method seems to be one zone per ZFS fs.  I think 
 that's what you 
 were asking about.  That model allows you to put a disk quota on a 
 zone.
 You can accomplish that same goal with SDS (now called SVM) and 
 soft partitions, 
 but you wouldn't get all of the ZFS magic. :-)
 
 
  3. Also, at a later date to
  increase the zfs partitions used for zones as and when required.
 
 Yes.
 
 
 
 ---
 ---
 Jeff VICTOR  Sun Microsystemsjeff.victor @ 
 sun.comOS AmbassadorSr. Technical Specialist
 Solaris 10 Zones FAQ:
 http://www.opensolaris.org/os/community/zones/faq--
 
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

2006-10-13 Thread Joerg Schilling
Spencer Shepler [EMAIL PROTECTED] wrote:

 Sorry, the code in Solaris would behave as I described.  Upon the 
 application closing the file, modified data is written to the server.
 The client waits for completion of those writes.  If there is an error,
 it is returned to the caller of close().

So is this Solaris specific, or why are people warned to depend on the close()
return code only?

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

2006-10-13 Thread Joerg Schilling
Jeff Victor [EMAIL PROTECTED] wrote:

 Your working did not match with the reality, this is why I did write this.
 You did write that upon close() the client will first do something similar 
 to 
 fsync on that file. The problem is that this is done asynchronously and the
 close() return value does noo contain an indication on whether the fsync
 did succeed.
  
  Sorry, the code in Solaris would behave as I described.  Upon the 
  application closing the file, modified data is written to the server.
  The client waits for completion of those writes.  If there is an error,
  it is returned to the caller of close().

 Are you talking about the client-end of NFS, as implemented in Solaris, or 
 the 
 application-clients like vi?

 It seems to me that you are talking about Solaris, and Joerg is talking about 
 vi 
 (and other applications).

I am talking about the syscall interface to applications.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar

2006-10-13 Thread Spencer Shepler
On Fri, Joerg Schilling wrote:
 Spencer Shepler [EMAIL PROTECTED] wrote:
 
  Sorry, the code in Solaris would behave as I described.  Upon the 
  application closing the file, modified data is written to the server.
  The client waits for completion of those writes.  If there is an error,
  it is returned to the caller of close().
 
 So is this Solaris specific, or why are people warned to depend on the close()
 return code only?

All unix NFS clients that I know of behave the way I described.

I believe the warning about relying on close() is that by the time
the application receives the error it is too late to recover.

If the application uses fsync() and receives an error, the application
can warn the user and they may be able to do something about it (your
example of ENOSPC is a very good one).  Space can be freed, and the
fsync() can be done again and the client will again push the writes
to the server and be successful.

If an application doesn't care about recovery but wants the error to
report back to the user, then close() is sufficient.

Spencer

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Where is the ZFS configuration data stored?

2006-10-13 Thread Keith Clay
Does it matter if the /dev names of the partitions change (i.e. from / 
dev/dsk/c2t2250CC611005d3s0 to another machine not using sun hba  
drivers with a different/shorter name??)


thanks


keith




If the file does not exist than ZFS will not attempt to open any
pools at boot.  You must issue an explicit 'zpool import' command to
probe the available devices for metadata to re-discover your pools.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Self-tuning recordsize

2006-10-13 Thread Jeremy Teo

Would it be worthwhile to implement heuristics to auto-tune
'recordsize', or would that not be worth the effort?

--
Regards,
Jeremy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Where is the ZFS configuration data stored?

2006-10-13 Thread Darren Dunham
 Does it matter if the /dev names of the partitions change (i.e. from / 
 dev/dsk/c2t2250CC611005d3s0 to another machine not using sun hba  
 drivers with a different/shorter name??)

It should not.  As long as all the disks are visible and ZFS can read
the labels, it should be able to import the pool.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: zfs/raid configuration question for an

2006-10-13 Thread Anton B. Rang
Most ZFS improvements should be available through patches. Some may require 
moving to a future update (for instance, ZFS booting, which may have other 
implications throughout the system).

On most systems, you won’t see a lot of difference between hardware or software 
mirroring.

The benefit of software mirroring is primarily that you don’t depend on a 
controller. ZFS gives the additional benefit that not only a failed disk block, 
but one which was written incorrectly, can be detected and recovered from the 
alternate side of the mirror.

The benefit of hardware mirroring is twofold. First, the “dirty map” can be 
maintained in fast hardware (e.g. NVRAM), which can reduce the amount of time 
that it takes to rebuild the mirror at startup and may slightly increase the 
speed of random writes. (ZFS uses a different technique to maintain consistency 
and does not need to rebuild its mirror after a crash, unlike SVM.) Second, you 
only move the data once across the host bus and disk controller, instead of 
twice, which on a heavily loaded system can increase your I/O throughput.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Self-tuning recordsize

2006-10-13 Thread Matthew Ahrens

Jeremy Teo wrote:

Would it be worthwhile to implement heuristics to auto-tune
'recordsize', or would that not be worth the effort?


It would be really great to automatically select the proper recordsize 
for each file!  How do you suggest doing so?


--matt

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and zones

2006-10-13 Thread Matthew Ahrens

Roshan Perera wrote:

Hi Jeff  Robert, Thanks for the reply. Your interpretation is
correct and the answer spot on.

This is going to be at a VIP clients QA/production environment and
first introduction to 10, zones and zfs. Anything unsupported is not
allowed. Hence I may have to wait for the fix. Do you know roughly
when the fixes will be available. So that I can give the cusrtomer
some time related info. Thanks again. Roshan


Using ZFS for a zones root is currently planned to be supported in 
solaris 10 update 5, but we are working on moving it up to update 4.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper and ZFS

2006-10-13 Thread Matthew Ahrens

Robert Milkowski wrote:

Hello Richard,

Friday, October 13, 2006, 8:05:18 AM, you wrote:

REP Do you want data availability, data retention, space, or performance?

data availability, space, performance

However we're talking about quite a lot of small IOs (r+w).


Then you should seriously consider using mirrors.


The real question was what do you think about creating each raid group
only from disks from different controllers so controller failure won't
affect data availability.


On thumper, where the controllers (and cables, etc) are integrated into 
the system board, controller failure is extremely unlikely.  These 
controllers are much more reliable than your traditional SCSI card in a 
PCI slot.  In fact, most controller failures are due to SCSI bus 
negotiation problems (confused devices, bad cables, etc), which simply 
don't exist in the point-to-point (ie. SCSI, SAS) world.  So I wouldn't 
worry very much about spreading across controllers for the sake of 
controller failure.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Zfs Performance with millions of small files in Sendmail messaging environment]

2006-10-13 Thread Ramneek Sethi
Hello Experts

Would appreciate if somebody can comment on sendmail environment on
solaris 10.
How will Zfs perform if one has millions of files in sendmail message
store directory under zfs filesystem compared to UFS or VxFS..


-- 
Thanks  Regards,
***
  _/_/_/  _/_/  _/ _/  Ramneek Sethi
 _/  _/_/  _/_/   _/   Systems Support Engineer
_/_/_/  _/_/  _/  _/ _/Sun Microsystems India Pvt. Ltd.
   _/  _/_/  _/   _/_/ 5th Floor,Right Wing ,
  _/_/_/   _/_/_/   _/ _/  The Capital Court,Munirka
   New Delhi - 110067,INDIA
   Phone : 91--11-42191029
   Fax : 91-11-26160928
   Support SERVICESE-mail : [EMAIL PROTECTED]
***
For any Support Queries pls dial the Toll Free Number 1600-4254-786
***

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-13 Thread Nicolas Williams
On Fri, Oct 13, 2006 at 11:03:51AM +0200, Joerg Schilling wrote:
 Nicolas Williams [EMAIL PROTECTED] wrote:
 
  On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling wrote:
   Before we start defining the first offocial functionality for this Sun 
   feature, 
   we should define a mapping for Mac OS, FreeBSD and Linux. It may make 
   sense, to 
   define a sub directory for the attribute directory for keeping old 
   versions
   of a file.
 
  Definitely a sub-directory would be needed yes, and I don't agree to the
  first part.
 
 Why not?

Because I don't see how creating a sub-directory of the EA namespace for
storing FVs will step on the toes of anyone trying to map other
platforms' notions of EA onto Solaris'.  Is this being too optimistic?

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar

2006-10-13 Thread Anton B. Rang
For what it's worth, close-to-open consistency was added to Linux NFS in the 
2.4.20 kernel (late 2002 timeframe). This might be the source of some of the 
confusion.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Self-tuning recordsize

2006-10-13 Thread Anton B. Rang
One technique would be to keep a histogram of read  write sizes.

Presumably one would want to do this only during a “tuning phase” after the 
file was first created, or when access patterns change. (A shift to smaller 
record sizes can be detected by a large proportion of write operations which 
require block pre-reads; a shift to larger record sizes can be detected by a 
large proportion of write operations which write more than one block.)

The ability to change the block size on-the-fly seems useful here.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] no tool to get expected disk usage reports

2006-10-13 Thread Dennis Clarke


- Original Message -
Subject: no tool to get expected disk usage reports
From:Dennis Clarke [EMAIL PROTECTED]
Date:Fri, October 13, 2006 14:29
To:  zfs-discuss@opensolaris.org


given :

bash-3.1# uname -a
SunOS mars 5.11 snv_46 sun4u sparc SUNW,Ultra-2

bash-3.1# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
zfs0  89.4G   110G  24.5K  legacy
zfs0/backup   65.8G  6.19G  65.8G  /export/zfs/backup
zfs0/kayak23.3G  8.69G  23.3G  /export/zfs/kayak
zfs0/zoner 279M  63.7G  24.5K  legacy
zfs0/zoner/common   53K  16.0G  24.5K  legacy
zfs0/zoner/common/postgres  28.5K  4.00G  28.5K  /export/zfs/postgres
zfs0/zoner/postgres279M  7.73G   279M  /export/zfs/zone/postgres

bash-3.1#
bash-3.1# zfs get all zfs0/kayak
NAME PROPERTY   VALUE  SOURCE
zfs0/kayak   type   filesystem -
zfs0/kayak   creation   Sun Oct  1 23:42 2006  -
zfs0/kayak   used   23.3G  -
zfs0/kayak   available  8.69G  -
zfs0/kayak   referenced 23.3G  -
zfs0/kayak   compressratio  1.19x  -
zfs0/kayak   mountedyes-
zfs0/kayak   quota  32Glocal
zfs0/kayak   reservationnone   default
zfs0/kayak   recordsize 128K   default
zfs0/kayak   mountpoint /export/zfs/kayak  local
zfs0/kayak   sharenfs   offdefault
zfs0/kayak   checksum   on default
zfs0/kayak   compressionon inherited from zfs0
zfs0/kayak   atime  on default
zfs0/kayak   deviceson default
zfs0/kayak   exec   on default
zfs0/kayak   setuid on default
zfs0/kayak   readonly   offdefault
zfs0/kayak   zoned  offdefault
zfs0/kayak   snapdirhidden default
zfs0/kayak   aclmodegroupmask  default
zfs0/kayak   aclinherit secure default

bash-3.1# pwd
/export/zfs/kayak
bash-3.1# ls
c  d  e  f  g
bash-3.1# du -sk c
1246404 c

bash-3.1# find c -type f -ls | awk 'BEGIN{ ttl=0 }{ ttl+=$7 }END{ print
Total size  ttl }'
Total size 1752184261

Due to compression there is no easy way to get the expected total size of
a tree of files and directories.

worse, there may be various ways to get a sum total of files in a tree but
the results may be wildly different from what du reports thus :

bash-3.1# find f -type f -ls | awk 'BEGIN{ ttl=0 }{ ttl+=$7 }END{ print
Total size  ttl }'
Total size 3387278008853146
bash-3.1# du -sk f
22672288 f
bash-3.1#

Is there a way to modify du or perhaps create a new tool ?

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Self-tuning recordsize

2006-10-13 Thread Nicolas Williams
On Fri, Oct 13, 2006 at 08:30:27AM -0700, Matthew Ahrens wrote:
 Jeremy Teo wrote:
 Would it be worthwhile to implement heuristics to auto-tune
 'recordsize', or would that not be worth the effort?
 
 It would be really great to automatically select the proper recordsize 
 for each file!  How do you suggest doing so?

I would suggest the following:

 - on file creation start with record size = 8KB (or some such smallish
   size), but don't record this on-disk yet

 - keep the record size at 8KB until the file exceeds some size, say,
   .5MB, at which point the most common read size, if there were enough
   reads, or the most common write size otherwise, should be used to
   derive the actual file record size (rounding up if need be)

- if the selected record size != 8KB then re-write the file with the
  new record size

- record the file's selected record size in an extended attribute

 - on truncation keep the existing file record size

 - on open of non-empty files without associated file record size stick
   to the original approach (growing the file block size up to the FS
   record size, defaulting to 128KB)

I think we should create a namespace for Solaris-specific extended
attributes.

The file record size attribute should be writable, but changes in record
size should only be allowed when the file is empty or when the file data
is in one block.  E.g., writing 8KB to a file's RS EA when the file's
larger than 8KB or consists of more than one block should appear to
succeed, but a subsequent read of the RS EA should show the previous
record size.

This approach might lead to the creation of new tunables for controlling
the heuristic (e.g., which heuristic, initial RS, file size at which RS
will be determined, default RS when none can be determined).

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Usability issue : improve means of finding ZFS-physdevice(s) mapping

2006-10-13 Thread Noel Dellofano
I  don't understand why you can't use 'zpool status'?  That will show  
the pools and the physical devices in each and is also a pretty basic  
command.  Examples are given in the sysadmin docs and manpages for  
ZFS on the opensolaris ZFS community page.


I realize it's not quite the same command as in UFS, and it's easier  
when things remain the same, but it's a different filesystem so you  
need some different commands that make more sense for how it's  
structured. The idea being hopefully that  soon zpool and zfs  
commands will become just as 'intuitive' for people :)


Noel

(p.s. not to mention am I the only person that thinks that 'zpool  
status' (in human speak, not geek) makes more sense than 'df'? wtf )


On Oct 13, 2006, at 1:55 PM, Bruce Chapman wrote:


ZFS is supposed to be much easier to use than UFS.

For creating a filesystem, I agree it is, as I could do that easily  
without a man page.


However, I found it rather surprising that I could not see the  
physical device(s) a zfs filesystem was attached to using either  
df command (that shows physical device mount points for all other  
file systems), or even the zfs command.


Even going to zpool command it took a few minutes to finally  
stumble across the only two commands that will give you that  
information, as it is not exactly intuitive.


Ideally, I'd think df should show physical device connections of  
zfs pools, though I can imagine there may be some circumstances  
where that is not desirable so perhaps a new argument would be  
needed to specify if that detail is shown or not.


If this is not done, I think zfs list -v  (-v is not currently an  
option to the zfs list command) should show the physical devices in  
use by the pools.


In any case, I think it is clear zpool list should have a -v  
argument added that will show the device associations, so that  
people don't have to stumble blindly until they run into the zpool  
iostat -v or zpool status -v commands to finally accomplish this  
rather simple task.


Any comments on the above?  I'm using S10 06/06, so perhaps I'll  
get lucky and someone has already added one or all the above  
improvements. :)


Cheers,

   Bruce


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Thumper and ZFS

2006-10-13 Thread Robert Milkowski
Hello Matthew,

Friday, October 13, 2006, 5:37:45 PM, you wrote:

MA Robert Milkowski wrote:
 Hello Richard,
 
 Friday, October 13, 2006, 8:05:18 AM, you wrote:
 
 REP Do you want data availability, data retention, space, or performance?
 
 data availability, space, performance
 
 However we're talking about quite a lot of small IOs (r+w).

MA Then you should seriously consider using mirrors.

'coz of space requirements that's not possible.
I hope RAID-Z will do.


 The real question was what do you think about creating each raid group
 only from disks from different controllers so controller failure won't
 affect data availability.

MA On thumper, where the controllers (and cables, etc) are integrated into
MA the system board, controller failure is extremely unlikely.  These 
MA controllers are much more reliable than your traditional SCSI card in a
MA PCI slot.  In fact, most controller failures are due to SCSI bus 
MA negotiation problems (confused devices, bad cables, etc), which simply
MA don't exist in the point-to-point (ie. SCSI, SAS) world.  So I wouldn't
MA worry very much about spreading across controllers for the sake of 
MA controller failure.

That's a good point.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs Performance with millions of small files in Sendmail messaging environment]

2006-10-13 Thread Robert Milkowski
Hello Ramneek,

Friday, October 13, 2006, 6:07:22 PM, you wrote:

RS Hello Experts

RS Would appreciate if somebody can comment on sendmail environment on
RS solaris 10.
RS How will Zfs perform if one has millions of files in sendmail message
RS store directory under zfs filesystem compared to UFS or VxFS..

Actually not sendmail but also MTA and
ZFS is about 5% better in real production than UFS+SVM.



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] ZFS Usability issue : improve means of finding ZFS-physdevice(s) mapping

2006-10-13 Thread Robert Milkowski
Hello Noel,

Friday, October 13, 2006, 11:22:06 PM, you wrote:

ND I  don't understand why you can't use 'zpool status'?  That will show
ND the pools and the physical devices in each and is also a pretty basic
ND command.  Examples are given in the sysadmin docs and manpages for  
ND ZFS on the opensolaris ZFS community page.

Showing physical devs in df output with ZFS is not right and I do not
imagine how one would show in df output for a pool with dozen disks.

But an option to zpool command to display config in such a way so it's
easy (almost copypaste) to recreate such config would be useful.
Something like metastat -p.



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Usability issue : improve means of finding ZFS-physdevice(s) mapping

2006-10-13 Thread Matthew Ahrens

Robert Milkowski wrote:

Hello Noel,

Friday, October 13, 2006, 11:22:06 PM, you wrote:

ND I  don't understand why you can't use 'zpool status'?  That will show
ND the pools and the physical devices in each and is also a pretty basic
ND command.  Examples are given in the sysadmin docs and manpages for  
ND ZFS on the opensolaris ZFS community page.


Showing physical devs in df output with ZFS is not right and I do not
imagine how one would show in df output for a pool with dozen disks.

But an option to zpool command to display config in such a way so it's
easy (almost copypaste) to recreate such config would be useful.
Something like metastat -p.


Agreed, see 6276640 zpool config.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs_vfsops.c : zfs_vfsinit() : line 1179: Src inspection

2006-10-13 Thread Erblichs
Group,

If their is a bad vfs ops template, why
wouldn't you just return(error) versus
trying to create the vnode ops template?

My suggestion is after the cmn_err()
then   return(error);

Mitchell Erblich
---
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and zones

2006-10-13 Thread Mike Gerdts

On 10/13/06, Matthew Ahrens [EMAIL PROTECTED] wrote:

Using ZFS for a zones root is currently planned to be supported in
solaris 10 update 5, but we are working on moving it up to update 4.


Are there any areas where the community can help with this?  Would
code or me too! support calls help the most?

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] fsflush and zfs

2006-10-13 Thread Neil Perrin

ZFS ignores the fsflush. Here's a snippet of the code in zfs_sync():

/*
 * SYNC_ATTR is used by fsflush() to force old filesystems like UFS
 * to sync metadata, which they would otherwise cache indefinitely.
 * Semantically, the only requirement is that the sync be initiated.
 * The DMU syncs out txgs frequently, so there's nothing to do.
 */
if (flag  SYNC_ATTR)
return (0);

However, for a user initiated sync(1m) and sync(2) ZFS does force
all outstanding data/transactions synchronously to disk .
This goes beyond the requirement of sync(2) which says IO is inititiated
but not waited on (ie asynchronous).

Neil.

ttoulliu2002 wrote On 10/13/06 00:06,:

Is there any change regarding fsflush such as autoup tunable for zfs ?

Thanks
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Self-tuning recordsize

2006-10-13 Thread Erblichs
Group,

I am not sure I agree with the 8k size.

Since recordsize is based on the size of filesystem blocks
for large files, my first consideration is what will be
the max size of the file object.

For extremely large files (25 to 100GBs), that are accessed 
sequentially for both read  write, I would expect 64k or 128k. 

Putpage functions attempt to grab a number of pages off the
vnode and place their modified contents within disk blocks.
Thus if disk blocks are larger, then a fewer of them are needed,
and can result in a more efficient operations.

However, any small change to the filesystem block would result
in the entire filesystem block being accessed, so small accesses
to the block are very inefficent.

Lastly, the access to a larger block will occupy the media
for longer periods of continuous time, possibly creating a
larger latency than necessary for another non-related op.

Hope this helps...

Mitchell Erblich
---


Nicolas Williams wrote:
 
 On Fri, Oct 13, 2006 at 08:30:27AM -0700, Matthew Ahrens wrote:
  Jeremy Teo wrote:
  Would it be worthwhile to implement heuristics to auto-tune
  'recordsize', or would that not be worth the effort?
 
  It would be really great to automatically select the proper recordsize
  for each file!  How do you suggest doing so?
 
 I would suggest the following:
 
  - on file creation start with record size = 8KB (or some such smallish
size), but don't record this on-disk yet
 
  - keep the record size at 8KB until the file exceeds some size, say,
.5MB, at which point the most common read size, if there were enough
reads, or the most common write size otherwise, should be used to
derive the actual file record size (rounding up if need be)
 
 - if the selected record size != 8KB then re-write the file with the
   new record size
 
 - record the file's selected record size in an extended attribute
 
  - on truncation keep the existing file record size
 
  - on open of non-empty files without associated file record size stick
to the original approach (growing the file block size up to the FS
record size, defaulting to 128KB)
 
 I think we should create a namespace for Solaris-specific extended
 attributes.
 
 The file record size attribute should be writable, but changes in record
 size should only be allowed when the file is empty or when the file data
 is in one block.  E.g., writing 8KB to a file's RS EA when the file's
 larger than 8KB or consists of more than one block should appear to
 succeed, but a subsequent read of the RS EA should show the previous
 record size.
 
 This approach might lead to the creation of new tunables for controlling
 the heuristic (e.g., which heuristic, initial RS, file size at which RS
 will be determined, default RS when none can be determined).
 
 Nico
 --
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss