Re: [zfs-discuss] Thumper and ZFS
Do you want data availability, data retention, space, or performance? -- richard Robert Milkowski wrote: Hello zfs-discuss, While waiting for Thumpers to come I'm thinking how to configure them. I would like to use raid-z. As thumper has 6 SATA controllers each 8-port then maybe it would make sense to create raid-z groups from 6 disks each from separate controller. Then combine 7 such groups into one pool. Then there're 6 disks remaining with two of them designated for system (mirror) which leaves 4 disks probably as hot-spares. That way if one controller fails entire pool will still be ok. What do you think? ps. there still will be SPOF for boot disks and hot spares but it looks like there's no choice anyway. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] fsflush and zfs
Is there any change regarding fsflush such as autoup tunable for zfs ? Thanks This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS ACLs and Samba
ZFS/NFSv4 introduced a new acl model (see acl(2) ...nevada (OpenSolaris) Solaris10u2). There is no compatibility bridge between the GETACL/SETACL/GETACLCNT and ACE_SETACL/ACE_SETACL/ACE_GETACLCNT fonctions of acl(2) syscall. Because this is Solaris speciffic (samba.org defines its internal acl handling based on posix acls) problem Sun is working on support of the Samba on ZFS/NFSv4 volumes. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A versioning FS
Nicolas Williams [EMAIL PROTECTED] wrote: On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling wrote: Before we start defining the first offocial functionality for this Sun feature, we should define a mapping for Mac OS, FreeBSD and Linux. It may make sense, to define a sub directory for the attribute directory for keeping old versions of a file. Definitely a sub-directory would be needed yes, and I don't agree to the first part. Why not? Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar
Spencer Shepler [EMAIL PROTECTED] wrote: I didn't comment on the error conditions that can occur during the writing of data upon close(). What you describe is the preferred method of obtaining any errors that occur during the writing of data. This occurs because the NFS client is writing asynchronously and the only method the application has of retrieving the error information is from the fsync() or close() call. At close(), it is to late to recovery so fsync() can be used to obtain any asynchronous error state. This doesn't change the fact that upon close() the NFS client will write data back to the server. This is done to meet the close-to-open semantics of NFS. Your working did not match with the reality, this is why I did write this. You did write that upon close() the client will first do something similar to fsync on that file. The problem is that this is done asynchronously and the close() return value does noo contain an indication on whether the fsync did succeed. It would also make it harder to implement error control as it may be that a problem is detected late while another large file is being extracted. Star could not just quit with an error message but would need to delay the error caused exit. Sure, I can see that it would be difficult. My point is that tar is not only waiting upon the fsync()/close() but also on file and directory creation. There is a longer delay not only because of the network latency but also the latency to writing the filesystem data to stable storage. Parallel requests will tend to overcome the delay/bandwidth issues. Not easy but can be an advantage with respect to performance. I see no simple way to let tar implement concurrenty with respect to these problems. In star, it would be possible to create detached threads that work independently on small files that in sum are smaller than the size of the FIFO. This would however make the code much more complex. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Thumper and ZFS
Hello Richard, Friday, October 13, 2006, 8:05:18 AM, you wrote: REP Do you want data availability, data retention, space, or performance? data availability, space, performance However we're talking about quite a lot of small IOs (r+w). The real question was what do you think about creating each raid group only from disks from different controllers so controller failure won't affect data availability. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs and zones
Hi, Sorry if this has been raised before. Question: IS it possible to 1. Solaris 10 OS partitons to be SDS and have a single partition on that same disk (without SDS) to be ZFS slice. 2. Partition the zfs slice for many partitions and each partition to hold a zone. Idea is to create many non-global zones and each zone to be in a zfs partition. 3. Also, at a later date to increase the zfs partitions used for zones as and when required. Am I dreaming :-) Thanks Roshan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs and zones
Hello Roshan, Friday, October 13, 2006, 1:12:12 PM, you wrote: RP Hi, RP Sorry if this has been raised before. RP Question: IS it possible to RP 1. Solaris 10 OS partitons to be SDS and have a single partition RP on that same disk (without SDS) to be ZFS slice. Yes. RP 2. Partition the zfs slice for many partitions and each partition RP to hold a zone. Idea is to create many non-global zones and each zone to be in a zfs partition. Yes. (I guess you want to have separate zfs file systetem for each zone) RP 3. Also, at a later date to increase the zfs partitions used for zones as and when required. Yes. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs and zones
Roshan Perera wrote: Hi, Sorry if this has been raised before. Question: IS it possible to 1. Solaris 10 OS partitons to be SDS and have a single partition on that same disk (without SDS) to be ZFS slice. Yes. 2. Partition the zfs slice for many partitions and each partition to hold a zone. Idea is to create many non-global zones and each zone to be in a zfs partition. I am not aware of the word partition in ZFS parlance, but I think I know what you mean, so I will attempt to answer with my interpretation: You can use a disk slice as a device in a ZFS pool. In that pool you can create one or more ZFS filesystems. A zone's root directory could be installed in a ZFS filesystem, but this is not yet recommended, nor is it supported, because it is not yet possible to apply a Solaris update to a system configured like that. This will be fixed. If you don't care about that limitation, you can put one or more zones in a ZFS fs. The best method seems to be one zone per ZFS fs. I think that's what you were asking about. That model allows you to put a disk quota on a zone. You can accomplish that same goal with SDS (now called SVM) and soft partitions, but you wouldn't get all of the ZFS magic. :-) 3. Also, at a later date to increase the zfs partitions used for zones as and when required. Yes. -- Jeff VICTOR Sun Microsystemsjeff.victor @ sun.com OS AmbassadorSr. Technical Specialist Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar
On Fri, Joerg Schilling wrote: Spencer Shepler [EMAIL PROTECTED] wrote: I didn't comment on the error conditions that can occur during the writing of data upon close(). What you describe is the preferred method of obtaining any errors that occur during the writing of data. This occurs because the NFS client is writing asynchronously and the only method the application has of retrieving the error information is from the fsync() or close() call. At close(), it is to late to recovery so fsync() can be used to obtain any asynchronous error state. This doesn't change the fact that upon close() the NFS client will write data back to the server. This is done to meet the close-to-open semantics of NFS. Your working did not match with the reality, this is why I did write this. You did write that upon close() the client will first do something similar to fsync on that file. The problem is that this is done asynchronously and the close() return value does noo contain an indication on whether the fsync did succeed. Sorry, the code in Solaris would behave as I described. Upon the application closing the file, modified data is written to the server. The client waits for completion of those writes. If there is an error, it is returned to the caller of close(). Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs and zones
Hi Jeff Robert, Thanks for the reply. Your interpretation is correct and the answer spot on. This is going to be at a VIP clients QA/production environment and first introduction to 10, zones and zfs. Anything unsupported is not allowed. Hence I may have to wait for the fix. Do you know roughly when the fixes will be available. So that I can give the cusrtomer some time related info. Thanks again. Roshan - Original Message - From: Jeff Victor [EMAIL PROTECTED] Date: Friday, October 13, 2006 2:56 pm Subject: Re: [zfs-discuss] zfs and zones To: Roshan Perera [EMAIL PROTECTED] Cc: zfs-discuss@opensolaris.org Roshan Perera wrote: Hi, Sorry if this has been raised before. Question: IS it possible to 1. Solaris 10 OS partitons to be SDS and have a single partition on that same disk (without SDS) to be ZFS slice. Yes. 2. Partition the zfs slice for many partitions and each partition to hold a zone. Idea is to create many non-global zones and each zone to be in a zfs partition. I am not aware of the word partition in ZFS parlance, but I think I know what you mean, so I will attempt to answer with my interpretation: You can use a disk slice as a device in a ZFS pool. In that pool you can create one or more ZFS filesystems. A zone's root directory could be installed in a ZFS filesystem, but this is not yet recommended, nor is it supported, because it is not yet possible to apply a Solaris update to a system configured like that. This will be fixed. If you don't care about that limitation, you can put one or more zones in a ZFS fs. The best method seems to be one zone per ZFS fs. I think that's what you were asking about. That model allows you to put a disk quota on a zone. You can accomplish that same goal with SDS (now called SVM) and soft partitions, but you wouldn't get all of the ZFS magic. :-) 3. Also, at a later date to increase the zfs partitions used for zones as and when required. Yes. --- --- Jeff VICTOR Sun Microsystemsjeff.victor @ sun.comOS AmbassadorSr. Technical Specialist Solaris 10 Zones FAQ: http://www.opensolaris.org/os/community/zones/faq-- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar
Spencer Shepler [EMAIL PROTECTED] wrote: Sorry, the code in Solaris would behave as I described. Upon the application closing the file, modified data is written to the server. The client waits for completion of those writes. If there is an error, it is returned to the caller of close(). So is this Solaris specific, or why are people warned to depend on the close() return code only? Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar
Jeff Victor [EMAIL PROTECTED] wrote: Your working did not match with the reality, this is why I did write this. You did write that upon close() the client will first do something similar to fsync on that file. The problem is that this is done asynchronously and the close() return value does noo contain an indication on whether the fsync did succeed. Sorry, the code in Solaris would behave as I described. Upon the application closing the file, modified data is written to the server. The client waits for completion of those writes. If there is an error, it is returned to the caller of close(). Are you talking about the client-end of NFS, as implemented in Solaris, or the application-clients like vi? It seems to me that you are talking about Solaris, and Joerg is talking about vi (and other applications). I am talking about the syscall interface to applications. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [nfs-discuss] Re: [zfs-discuss] Re: NFS Performance and Tar
On Fri, Joerg Schilling wrote: Spencer Shepler [EMAIL PROTECTED] wrote: Sorry, the code in Solaris would behave as I described. Upon the application closing the file, modified data is written to the server. The client waits for completion of those writes. If there is an error, it is returned to the caller of close(). So is this Solaris specific, or why are people warned to depend on the close() return code only? All unix NFS clients that I know of behave the way I described. I believe the warning about relying on close() is that by the time the application receives the error it is too late to recover. If the application uses fsync() and receives an error, the application can warn the user and they may be able to do something about it (your example of ENOSPC is a very good one). Space can be freed, and the fsync() can be done again and the client will again push the writes to the server and be successful. If an application doesn't care about recovery but wants the error to report back to the user, then close() is sufficient. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Where is the ZFS configuration data stored?
Does it matter if the /dev names of the partitions change (i.e. from / dev/dsk/c2t2250CC611005d3s0 to another machine not using sun hba drivers with a different/shorter name??) thanks keith If the file does not exist than ZFS will not attempt to open any pools at boot. You must issue an explicit 'zpool import' command to probe the available devices for metadata to re-discover your pools. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Self-tuning recordsize
Would it be worthwhile to implement heuristics to auto-tune 'recordsize', or would that not be worth the effort? -- Regards, Jeremy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Where is the ZFS configuration data stored?
Does it matter if the /dev names of the partitions change (i.e. from / dev/dsk/c2t2250CC611005d3s0 to another machine not using sun hba drivers with a different/shorter name??) It should not. As long as all the disks are visible and ZFS can read the labels, it should be able to import the pool. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: zfs/raid configuration question for an
Most ZFS improvements should be available through patches. Some may require moving to a future update (for instance, ZFS booting, which may have other implications throughout the system). On most systems, you won’t see a lot of difference between hardware or software mirroring. The benefit of software mirroring is primarily that you don’t depend on a controller. ZFS gives the additional benefit that not only a failed disk block, but one which was written incorrectly, can be detected and recovered from the alternate side of the mirror. The benefit of hardware mirroring is twofold. First, the “dirty map” can be maintained in fast hardware (e.g. NVRAM), which can reduce the amount of time that it takes to rebuild the mirror at startup and may slightly increase the speed of random writes. (ZFS uses a different technique to maintain consistency and does not need to rebuild its mirror after a crash, unlike SVM.) Second, you only move the data once across the host bus and disk controller, instead of twice, which on a heavily loaded system can increase your I/O throughput. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Self-tuning recordsize
Jeremy Teo wrote: Would it be worthwhile to implement heuristics to auto-tune 'recordsize', or would that not be worth the effort? It would be really great to automatically select the proper recordsize for each file! How do you suggest doing so? --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs and zones
Roshan Perera wrote: Hi Jeff Robert, Thanks for the reply. Your interpretation is correct and the answer spot on. This is going to be at a VIP clients QA/production environment and first introduction to 10, zones and zfs. Anything unsupported is not allowed. Hence I may have to wait for the fix. Do you know roughly when the fixes will be available. So that I can give the cusrtomer some time related info. Thanks again. Roshan Using ZFS for a zones root is currently planned to be supported in solaris 10 update 5, but we are working on moving it up to update 4. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper and ZFS
Robert Milkowski wrote: Hello Richard, Friday, October 13, 2006, 8:05:18 AM, you wrote: REP Do you want data availability, data retention, space, or performance? data availability, space, performance However we're talking about quite a lot of small IOs (r+w). Then you should seriously consider using mirrors. The real question was what do you think about creating each raid group only from disks from different controllers so controller failure won't affect data availability. On thumper, where the controllers (and cables, etc) are integrated into the system board, controller failure is extremely unlikely. These controllers are much more reliable than your traditional SCSI card in a PCI slot. In fact, most controller failures are due to SCSI bus negotiation problems (confused devices, bad cables, etc), which simply don't exist in the point-to-point (ie. SCSI, SAS) world. So I wouldn't worry very much about spreading across controllers for the sake of controller failure. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Zfs Performance with millions of small files in Sendmail messaging environment]
Hello Experts Would appreciate if somebody can comment on sendmail environment on solaris 10. How will Zfs perform if one has millions of files in sendmail message store directory under zfs filesystem compared to UFS or VxFS.. -- Thanks Regards, *** _/_/_/ _/_/ _/ _/ Ramneek Sethi _/ _/_/ _/_/ _/ Systems Support Engineer _/_/_/ _/_/ _/ _/ _/Sun Microsystems India Pvt. Ltd. _/ _/_/ _/ _/_/ 5th Floor,Right Wing , _/_/_/ _/_/_/ _/ _/ The Capital Court,Munirka New Delhi - 110067,INDIA Phone : 91--11-42191029 Fax : 91-11-26160928 Support SERVICESE-mail : [EMAIL PROTECTED] *** For any Support Queries pls dial the Toll Free Number 1600-4254-786 *** ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A versioning FS
On Fri, Oct 13, 2006 at 11:03:51AM +0200, Joerg Schilling wrote: Nicolas Williams [EMAIL PROTECTED] wrote: On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling wrote: Before we start defining the first offocial functionality for this Sun feature, we should define a mapping for Mac OS, FreeBSD and Linux. It may make sense, to define a sub directory for the attribute directory for keeping old versions of a file. Definitely a sub-directory would be needed yes, and I don't agree to the first part. Why not? Because I don't see how creating a sub-directory of the EA namespace for storing FVs will step on the toes of anyone trying to map other platforms' notions of EA onto Solaris'. Is this being too optimistic? Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar
For what it's worth, close-to-open consistency was added to Linux NFS in the 2.4.20 kernel (late 2002 timeframe). This might be the source of some of the confusion. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Self-tuning recordsize
One technique would be to keep a histogram of read write sizes. Presumably one would want to do this only during a “tuning phase” after the file was first created, or when access patterns change. (A shift to smaller record sizes can be detected by a large proportion of write operations which require block pre-reads; a shift to larger record sizes can be detected by a large proportion of write operations which write more than one block.) The ability to change the block size on-the-fly seems useful here. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] no tool to get expected disk usage reports
- Original Message - Subject: no tool to get expected disk usage reports From:Dennis Clarke [EMAIL PROTECTED] Date:Fri, October 13, 2006 14:29 To: zfs-discuss@opensolaris.org given : bash-3.1# uname -a SunOS mars 5.11 snv_46 sun4u sparc SUNW,Ultra-2 bash-3.1# zfs list NAME USED AVAIL REFER MOUNTPOINT zfs0 89.4G 110G 24.5K legacy zfs0/backup 65.8G 6.19G 65.8G /export/zfs/backup zfs0/kayak23.3G 8.69G 23.3G /export/zfs/kayak zfs0/zoner 279M 63.7G 24.5K legacy zfs0/zoner/common 53K 16.0G 24.5K legacy zfs0/zoner/common/postgres 28.5K 4.00G 28.5K /export/zfs/postgres zfs0/zoner/postgres279M 7.73G 279M /export/zfs/zone/postgres bash-3.1# bash-3.1# zfs get all zfs0/kayak NAME PROPERTY VALUE SOURCE zfs0/kayak type filesystem - zfs0/kayak creation Sun Oct 1 23:42 2006 - zfs0/kayak used 23.3G - zfs0/kayak available 8.69G - zfs0/kayak referenced 23.3G - zfs0/kayak compressratio 1.19x - zfs0/kayak mountedyes- zfs0/kayak quota 32Glocal zfs0/kayak reservationnone default zfs0/kayak recordsize 128K default zfs0/kayak mountpoint /export/zfs/kayak local zfs0/kayak sharenfs offdefault zfs0/kayak checksum on default zfs0/kayak compressionon inherited from zfs0 zfs0/kayak atime on default zfs0/kayak deviceson default zfs0/kayak exec on default zfs0/kayak setuid on default zfs0/kayak readonly offdefault zfs0/kayak zoned offdefault zfs0/kayak snapdirhidden default zfs0/kayak aclmodegroupmask default zfs0/kayak aclinherit secure default bash-3.1# pwd /export/zfs/kayak bash-3.1# ls c d e f g bash-3.1# du -sk c 1246404 c bash-3.1# find c -type f -ls | awk 'BEGIN{ ttl=0 }{ ttl+=$7 }END{ print Total size ttl }' Total size 1752184261 Due to compression there is no easy way to get the expected total size of a tree of files and directories. worse, there may be various ways to get a sum total of files in a tree but the results may be wildly different from what du reports thus : bash-3.1# find f -type f -ls | awk 'BEGIN{ ttl=0 }{ ttl+=$7 }END{ print Total size ttl }' Total size 3387278008853146 bash-3.1# du -sk f 22672288 f bash-3.1# Is there a way to modify du or perhaps create a new tool ? Dennis ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Self-tuning recordsize
On Fri, Oct 13, 2006 at 08:30:27AM -0700, Matthew Ahrens wrote: Jeremy Teo wrote: Would it be worthwhile to implement heuristics to auto-tune 'recordsize', or would that not be worth the effort? It would be really great to automatically select the proper recordsize for each file! How do you suggest doing so? I would suggest the following: - on file creation start with record size = 8KB (or some such smallish size), but don't record this on-disk yet - keep the record size at 8KB until the file exceeds some size, say, .5MB, at which point the most common read size, if there were enough reads, or the most common write size otherwise, should be used to derive the actual file record size (rounding up if need be) - if the selected record size != 8KB then re-write the file with the new record size - record the file's selected record size in an extended attribute - on truncation keep the existing file record size - on open of non-empty files without associated file record size stick to the original approach (growing the file block size up to the FS record size, defaulting to 128KB) I think we should create a namespace for Solaris-specific extended attributes. The file record size attribute should be writable, but changes in record size should only be allowed when the file is empty or when the file data is in one block. E.g., writing 8KB to a file's RS EA when the file's larger than 8KB or consists of more than one block should appear to succeed, but a subsequent read of the RS EA should show the previous record size. This approach might lead to the creation of new tunables for controlling the heuristic (e.g., which heuristic, initial RS, file size at which RS will be determined, default RS when none can be determined). Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Usability issue : improve means of finding ZFS-physdevice(s) mapping
I don't understand why you can't use 'zpool status'? That will show the pools and the physical devices in each and is also a pretty basic command. Examples are given in the sysadmin docs and manpages for ZFS on the opensolaris ZFS community page. I realize it's not quite the same command as in UFS, and it's easier when things remain the same, but it's a different filesystem so you need some different commands that make more sense for how it's structured. The idea being hopefully that soon zpool and zfs commands will become just as 'intuitive' for people :) Noel (p.s. not to mention am I the only person that thinks that 'zpool status' (in human speak, not geek) makes more sense than 'df'? wtf ) On Oct 13, 2006, at 1:55 PM, Bruce Chapman wrote: ZFS is supposed to be much easier to use than UFS. For creating a filesystem, I agree it is, as I could do that easily without a man page. However, I found it rather surprising that I could not see the physical device(s) a zfs filesystem was attached to using either df command (that shows physical device mount points for all other file systems), or even the zfs command. Even going to zpool command it took a few minutes to finally stumble across the only two commands that will give you that information, as it is not exactly intuitive. Ideally, I'd think df should show physical device connections of zfs pools, though I can imagine there may be some circumstances where that is not desirable so perhaps a new argument would be needed to specify if that detail is shown or not. If this is not done, I think zfs list -v (-v is not currently an option to the zfs list command) should show the physical devices in use by the pools. In any case, I think it is clear zpool list should have a -v argument added that will show the device associations, so that people don't have to stumble blindly until they run into the zpool iostat -v or zpool status -v commands to finally accomplish this rather simple task. Any comments on the above? I'm using S10 06/06, so perhaps I'll get lucky and someone has already added one or all the above improvements. :) Cheers, Bruce This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Thumper and ZFS
Hello Matthew, Friday, October 13, 2006, 5:37:45 PM, you wrote: MA Robert Milkowski wrote: Hello Richard, Friday, October 13, 2006, 8:05:18 AM, you wrote: REP Do you want data availability, data retention, space, or performance? data availability, space, performance However we're talking about quite a lot of small IOs (r+w). MA Then you should seriously consider using mirrors. 'coz of space requirements that's not possible. I hope RAID-Z will do. The real question was what do you think about creating each raid group only from disks from different controllers so controller failure won't affect data availability. MA On thumper, where the controllers (and cables, etc) are integrated into MA the system board, controller failure is extremely unlikely. These MA controllers are much more reliable than your traditional SCSI card in a MA PCI slot. In fact, most controller failures are due to SCSI bus MA negotiation problems (confused devices, bad cables, etc), which simply MA don't exist in the point-to-point (ie. SCSI, SAS) world. So I wouldn't MA worry very much about spreading across controllers for the sake of MA controller failure. That's a good point. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs Performance with millions of small files in Sendmail messaging environment]
Hello Ramneek, Friday, October 13, 2006, 6:07:22 PM, you wrote: RS Hello Experts RS Would appreciate if somebody can comment on sendmail environment on RS solaris 10. RS How will Zfs perform if one has millions of files in sendmail message RS store directory under zfs filesystem compared to UFS or VxFS.. Actually not sendmail but also MTA and ZFS is about 5% better in real production than UFS+SVM. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] ZFS Usability issue : improve means of finding ZFS-physdevice(s) mapping
Hello Noel, Friday, October 13, 2006, 11:22:06 PM, you wrote: ND I don't understand why you can't use 'zpool status'? That will show ND the pools and the physical devices in each and is also a pretty basic ND command. Examples are given in the sysadmin docs and manpages for ND ZFS on the opensolaris ZFS community page. Showing physical devs in df output with ZFS is not right and I do not imagine how one would show in df output for a pool with dozen disks. But an option to zpool command to display config in such a way so it's easy (almost copypaste) to recreate such config would be useful. Something like metastat -p. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Usability issue : improve means of finding ZFS-physdevice(s) mapping
Robert Milkowski wrote: Hello Noel, Friday, October 13, 2006, 11:22:06 PM, you wrote: ND I don't understand why you can't use 'zpool status'? That will show ND the pools and the physical devices in each and is also a pretty basic ND command. Examples are given in the sysadmin docs and manpages for ND ZFS on the opensolaris ZFS community page. Showing physical devs in df output with ZFS is not right and I do not imagine how one would show in df output for a pool with dozen disks. But an option to zpool command to display config in such a way so it's easy (almost copypaste) to recreate such config would be useful. Something like metastat -p. Agreed, see 6276640 zpool config. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs_vfsops.c : zfs_vfsinit() : line 1179: Src inspection
Group, If their is a bad vfs ops template, why wouldn't you just return(error) versus trying to create the vnode ops template? My suggestion is after the cmn_err() then return(error); Mitchell Erblich --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs and zones
On 10/13/06, Matthew Ahrens [EMAIL PROTECTED] wrote: Using ZFS for a zones root is currently planned to be supported in solaris 10 update 5, but we are working on moving it up to update 4. Are there any areas where the community can help with this? Would code or me too! support calls help the most? Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] fsflush and zfs
ZFS ignores the fsflush. Here's a snippet of the code in zfs_sync(): /* * SYNC_ATTR is used by fsflush() to force old filesystems like UFS * to sync metadata, which they would otherwise cache indefinitely. * Semantically, the only requirement is that the sync be initiated. * The DMU syncs out txgs frequently, so there's nothing to do. */ if (flag SYNC_ATTR) return (0); However, for a user initiated sync(1m) and sync(2) ZFS does force all outstanding data/transactions synchronously to disk . This goes beyond the requirement of sync(2) which says IO is inititiated but not waited on (ie asynchronous). Neil. ttoulliu2002 wrote On 10/13/06 00:06,: Is there any change regarding fsflush such as autoup tunable for zfs ? Thanks This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Self-tuning recordsize
Group, I am not sure I agree with the 8k size. Since recordsize is based on the size of filesystem blocks for large files, my first consideration is what will be the max size of the file object. For extremely large files (25 to 100GBs), that are accessed sequentially for both read write, I would expect 64k or 128k. Putpage functions attempt to grab a number of pages off the vnode and place their modified contents within disk blocks. Thus if disk blocks are larger, then a fewer of them are needed, and can result in a more efficient operations. However, any small change to the filesystem block would result in the entire filesystem block being accessed, so small accesses to the block are very inefficent. Lastly, the access to a larger block will occupy the media for longer periods of continuous time, possibly creating a larger latency than necessary for another non-related op. Hope this helps... Mitchell Erblich --- Nicolas Williams wrote: On Fri, Oct 13, 2006 at 08:30:27AM -0700, Matthew Ahrens wrote: Jeremy Teo wrote: Would it be worthwhile to implement heuristics to auto-tune 'recordsize', or would that not be worth the effort? It would be really great to automatically select the proper recordsize for each file! How do you suggest doing so? I would suggest the following: - on file creation start with record size = 8KB (or some such smallish size), but don't record this on-disk yet - keep the record size at 8KB until the file exceeds some size, say, .5MB, at which point the most common read size, if there were enough reads, or the most common write size otherwise, should be used to derive the actual file record size (rounding up if need be) - if the selected record size != 8KB then re-write the file with the new record size - record the file's selected record size in an extended attribute - on truncation keep the existing file record size - on open of non-empty files without associated file record size stick to the original approach (growing the file block size up to the FS record size, defaulting to 128KB) I think we should create a namespace for Solaris-specific extended attributes. The file record size attribute should be writable, but changes in record size should only be allowed when the file is empty or when the file data is in one block. E.g., writing 8KB to a file's RS EA when the file's larger than 8KB or consists of more than one block should appear to succeed, but a subsequent read of the RS EA should show the previous record size. This approach might lead to the creation of new tunables for controlling the heuristic (e.g., which heuristic, initial RS, file size at which RS will be determined, default RS when none can be determined). Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss