Re: [zfs-discuss] Dynamics of ZFS
Hello Roch, Wednesday, June 21, 2006, 2:31:25 PM, you wrote: R This just published: R R http://blogs.sun.com/roller/trackback/roch/Weblog/the_dynamics_of_zfs Proper link is: http://blogs.sun.com/roller/page/roch?entry=the_dynamics_of_zfs -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and Virtualization
Hi experts, I have few issues about ZFS and virtualization: [b]Virtualization and performance[/b] When filesystem traffic occurs on a zpool containing only spindles dedicated to this zpool i/o can be distributed evenly. When the zpool is located on a lun sliced from a raid group shared by multiple systems the capability of doing i/o from this zpool will be limited. Avoiding or limiting i/o to this lun until the load from the other systems decreases would overall help performance for the local zpool. I heard some rumors recently about using SMI-S to de-virtualize the traffic and allow Solaris to peek through the virtualization layers thus optimizing i/o target selection. Maybe someone has some rumors to add ;-) Virtualization with 6920 has been briefly discussed at http://www.opensolaris.org/jive/thread.jspa?messageID=14984#14984 but without conclusion or recommendations. [b]Volume mobility[/b] One of the major advantages of zfs is sharing of the zpool capacity between filesystems. I often run application in small application containers located on separate luns which are zoned to several hosts so they can be run on different hosts. The idea behind this is failover, testing and load adjustment. Because only complete zpools can be migrated capacity sharing between movable containers is currently impossible. Are there any plans to allow zpools to be concurrently shareable between hosts? Best regards -- Dagobert This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: RE: [Security-discuss] Proposal for new basic privileges related with
I am also interested in writing some test cases that will check the correct semantic of access checks on files with different permissions and with different privileges set/unset by the process. Are there already file access test cases at Sun I may expand? Should test suites for OpenSolaris be written in a special kind or programming languages? We do extensive file access testing as part of the zfs test suite. The test suite is mostly written in ksh scripts with some C code. We should have the test suite available externally via OpenSolaris.org sometime in July or August. In the meantime I would code up your unit tests in ksh so they can be more easily integrated. We'll keep you posted as progress in releasing the test suite is made. Cheers, Jim This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS questions (hybrid HDs)
Actually, while Seagate's little white paper doesn't explicitly say so, the FLASH is used for a write cache and that provides one of the major benefits: Writes to the disk rarely need to spin up the motor. Probably 90+% of all writes to disk will fit into the cache in a typical laptop environment (no, compiling OpenSolaris isn't typical usage…). My guess from reading between the lines of the Samsung/Microsoft press release is that there is a mechanism for the operating system to pin particular blocks into the cache (e.g. to speed boot) and the rest of the cache is used for write buffering. (Using it as a read cache doesn't buy much compared to using the normal drive cache RAM for that, and might also contribute to wear, which is why read caching appears to be under OS control rather than automatic.) Incidentally, there's a nice overview of some algorithms (including file systems) optimized for the characteristics of FLASH memory that was published by ACM last year, for the curious (who happen to have access to either the online or their local library). http://doi.acm.org/10.1145/1089733.1089735 Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Properties of ZFS snapshots I'd like to see...
Hi Constantin, The basic problem with regular snapshotting is that you end up managing so many of them. Wouldn't it be nice if you could assign an expiration date to a snapshot? The only reason you want the snapshot removed is because you don't want your pool to become full. IIRC VxFS has a feature to automatically delete a snapshot if a write would return ENOSPC, than the snaphot is deleted and the write is retried. This might be considered as an additional feature to your automatic expiry. Best regards -- Dago This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
The vi we were doing was a 2 line file. If you just vi a new file, add one line and exit it would take 15 minutes in fdsynch. On recommendation of a workaround we set set zfs:zil_disable=1 after the reboot the fdsynch is now 0.1 seconds. Now I have no idea if it was this setting or the fact that we went through a reboot. Whatever the root cause we are now back to a well behaved file system. thanks sean Roch wrote: 15 minutes to do a fdsync is way outside the slowdown usually seen. The footprint for 6413510 is that when a huge amount of data is being written non synchronously and a fsync comes in for the same filesystem then all the non-synchronous data is also forced out synchronously. So is there a lot of data being written during the vi? vi will write the whole file (in 4K) chunks and fsync it. (based on a single experiment). So for a largefile vi , on quit, we have lots of data to sync in and of itself. But because 6413510 we potentially have tosync lots ofother data written by other applications. Now take a Niagara with lots of available CPUs and lots of free memory (32GB maybe?) running some 'tar x' in parallel. A huge chunk of the 32GB can end up as dirty. I say too much so because of lack of throttling: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205 6429205 each zpool needs to monitor it's throughput and throttle heavy writers Then vi :q; fsyncs; and all of the pending data must sync. So we have extra data to sync because of: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6413510 zfs: writing to ZFS filesystem slows down fsync() on other files in the same FS Furthermore, we can be slowed by this: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6440499 zil should avoid txg_wait_synced() and use dmu_sync() to issue parallel IOs... Note: 6440499 is now fixed in the gate. And finally all this data goes to a single disk. Worse a slice of a disk. Since it's just a slice ZFS can't enable the write cache. Then if there is no tag queue (is there ?) we will handle everything one I/O at a time. If it's a SATA drive we have other issues... I think we've hit is all here. So can this lead to 15 min fsync ? I can't swear, Actually I won't be convinced myself before I convince you, but we do have things to chew on already. Do I recall that this is about a1GB file in vi ? :wq-uitting out of a 1 GB vi session on a 50MB/sec disk will take 20sec when everything hums and there are no other traffic involved. With no write cache / no tag queue , maybe 10X more. -r -- Sean Meighan Mgr ITSM Engineering Sun Microsystems, Inc. US Phone x32329 / +1 408 850-9537 Mobile 303-520-2024 Fax 408 850-9537 Email [EMAIL PROTECTED] NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
Well this does look more and more like a duplicate of: 6413510 zfs: writing to ZFS filesystem slows down fsync() on other files in the same FS Neil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
Sean Meighan writes: The vi we were doing was a 2 line file. If you just vi a new file, add one line and exit it would take 15 minutes in fdsynch. On recommendation of a workaround we set set zfs:zil_disable=1 after the reboot the fdsynch is now 0.1 seconds. Now I have no idea if it was this setting or the fact that we went through a reboot. Whatever the root cause we are now back to a well behaved file system. well behaved...In appearance only ! Maybe it's nice to validate hypothesis but you should not run with this option set, ever., it disable O_DSYNC and fsync() and I don't know what else. Bad idea, bad. -r bad. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposal for new basic privileges related with file system access checks
Nicolai Johannes wrote: For my Google Summer of Code project for OpenSolaris, my job is to think about new basic privileges. I like to propose five new basic privileges that relate with file system access checks and may be used for daemons like ssh or ssh-agent that (after starting up) never read or write user specific files: PRIV_FILE_IDENTITY_READ: Allow the process to benefit from its supplemental rights associated with its identity (euid, egid and associated groups) during file or directory operations that require read permissions. Additional rights gained through PRIV_FILE_DAC_READ will not be affected. PRIV_FILE_IDENTITY_WRITE: Allow the process to benefit from its supplemental rights associated with its identity (euid, egid and associated groups) during file or directory operations that require write permissions. or ownership of the file. Additional rights gained through PRIV_FILE_DAC_WRITE, PRIV_FILE_OWNER and PRIV_FILE_SETID will not be affected. PRIV_FILE_IDENTITY_OWNER: Allow the process to benefit from its supplemental rights associated with its identity (euid, egid and associated groups) during file or directory operations that require ownership of the file if PRIV_FILE_IDENTITY_READ and PRIV_FILE_IDENTITY_WRITE are set as well. If PRIV_FILE_IDENTITY_READ and PRIV_FILE_IDENTITY_WRITE are not both present, PRIV_FILE_IDENTITY_OWNER will not grant any supplemental rights. Additional rights gained through PRIV_FILE_OWNER and PRIV_FILE_SETID will not be affected. PRIV_FILE_IDENTITY_SEARCH: Allow the process to benefit from its supplemental rights associated with its identity (euid, egid and associated groups) during directory operations that require search permissions. Additional rights gained through PRIV_FILE_DAC_SEARCH will not be affected. PRIV_FILE_IDENTITY_EXECUTE: Allow the process to benefit from its supplemental rights associated with its identity (euid, egid and associated groups) during directory operations that require execute permissions. Additional rights gained through PRIV_FILE_DAC_EXECUTE will not be affected. If this seems to be too much new privileges, one could merge PRIV_FILE_IDENTITY_EXECUTE, PRIV_FILE_IDENTIY_SEARCH and PRIV_FILE_IDENTITY_READ as well as PRIV_FILE_IDENTITY_OWNER and PRIV_FILE_IDENTITY_WRITE together. So dropping the privileges is almost the same as if the process is running under an uid/egid that does not own any file or have permissions for any file. Only almost the same because dropping the privileges won't allow to access files that have permissions associated with them that deny access with the egid/euid. The new privileges willl at least affect open, creat, link, unlink, stat, exec, chmod, chgrp, chown, acl_set, acl_get and opendir. With dropping PRIV_FILE_ALLOW_IDENTITY_OWNER (and having no privilegs like PRIV_FILE_OWNER/SETID), the process won't be able to create files/directories (every file/directory needs to have an owner and its initial permissions may be manipulated), change permissions or owner/group. But it will still be possible to delete files/directories if everyone may delete this file (and the process would be able to if the new privileges were set). My mentor, Darren Moffat, suggested that I should start integrating the new privileges checks in ZFS, UFS and TMPFS (in this order) and then proceed with the various other file systems. I am relatively certain that I will find the appropriate places in the ZFS code due to the fact that the checks will correlate with the already present checks of the established PRIV_FILE_* privileges. More worries, I have with the checking order. I suggest three possibilities here: You might want to consider a different implementation order. I would recommend TMPFS, UFS and then ZFS. The reason for this is TMPFS only has permission bits and UFS has a simpler ACL model than ZFS. Standard file access for ufs/zfs/tmpfs does basically the following: 1. check requested access mode against files permission bits/acl. 2. If access is NOT granted from step 1, then secpolicy_vnode_access() is called which will then determine if process has necessary privileges to grant the request. 3. Solaris file systems never check explicitly for PRIV_XXX_ privileges. Instead it relies on secpolicy_XXX() functions to handle those decisions. If you are able to implement the new priv checking in the already used secpolicy functions, then hopefully you shouldn't need to modify any file system code. Access checking for chown/chgrp/utime/... aka VOP_SETATTR() goes through a different flavor of secpolicy functions. -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
Roch wrote: Sean Meighan writes: The vi we were doing was a 2 line file. If you just vi a new file, add one line and exit it would take 15 minutes in fdsynch. On recommendation of a workaround we set set zfs:zil_disable=1 after the reboot the fdsynch is now 0.1 seconds. Now I have no idea if it was this setting or the fact that we went through a reboot. Whatever the root cause we are now back to a well behaved file system. well behaved...In appearance only ! Maybe it's nice to validate hypothesis but you should not run with this option set, ever., it disable O_DSYNC and fsync() and I don't know what else. Bad idea, bad. Why is this option available then? (Yes, that's a loaded question.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
Torrey McMahon wrote On 06/21/06 10:29,: Roch wrote: Sean Meighan writes: The vi we were doing was a 2 line file. If you just vi a new file, add one line and exit it would take 15 minutes in fdsynch. On recommendation of a workaround we set set zfs:zil_disable=1 after the reboot the fdsynch is now 0.1 seconds. Now I have no idea if it was this setting or the fact that we went through a reboot. Whatever the root cause we are now back to a well behaved file system. well behaved...In appearance only ! Maybe it's nice to validate hypothesis but you should not run with this option set, ever., it disable O_DSYNC and fsync() and I don't know what else. Bad idea, bad. Why is this option available then? (Yes, that's a loaded question.) I wouldn't call it an option, but an internal debugging switch that I originally added to allow progress when initially integrating the ZIL. As Roch says it really shouldn't be ever set (as it does negate POSIX synchronous semantics). Nor should it be mentioned to a customer. In fact I'm inclined to now remove it - however it does still have a use as it helped root cause this problem. Neil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
Hello Neil, Wednesday, June 21, 2006, 6:41:50 PM, you wrote: NP Torrey McMahon wrote On 06/21/06 10:29,: Roch wrote: Sean Meighan writes: The vi we were doing was a 2 line file. If you just vi a new file, add one line and exit it would take 15 minutes in fdsynch. On recommendation of a workaround we set set zfs:zil_disable=1 after the reboot the fdsynch is now 0.1 seconds. Now I have no idea if it was this setting or the fact that we went through a reboot. Whatever the root cause we are now back to a well behaved file system. well behaved...In appearance only ! Maybe it's nice to validate hypothesis but you should not run with this option set, ever., it disable O_DSYNC and fsync() and I don't know what else. Bad idea, bad. Why is this option available then? (Yes, that's a loaded question.) NP I wouldn't call it an option, but an internal debugging switch that I NP originally added to allow progress when initially integrating the ZIL. NP As Roch says it really shouldn't be ever set (as it does negate POSIX NP synchronous semantics). Nor should it be mentioned to a customer. NP In fact I'm inclined to now remove it - however it does still have a use NP as it helped root cause this problem. Isn't it similar to unsupported fastfs for ufs? I think it could be useful in some cases after all. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
On Wed, Jun 21, 2006 at 10:41:50AM -0600, Neil Perrin wrote: Why is this option available then? (Yes, that's a loaded question.) I wouldn't call it an option, but an internal debugging switch that I originally added to allow progress when initially integrating the ZIL. As Roch says it really shouldn't be ever set (as it does negate POSIX synchronous semantics). Nor should it be mentioned to a customer. In fact I'm inclined to now remove it - however it does still have a use as it helped root cause this problem. Rename it to zil_disable_danger_will_robinson :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
Nicolas Williams wrote: On Wed, Jun 21, 2006 at 10:41:50AM -0600, Neil Perrin wrote: Why is this option available then? (Yes, that's a loaded question.) I wouldn't call it an option, but an internal debugging switch that I originally added to allow progress when initially integrating the ZIL. As Roch says it really shouldn't be ever set (as it does negate POSIX synchronous semantics). Nor should it be mentioned to a customer. In fact I'm inclined to now remove it - however it does still have a use as it helped root cause this problem. Rename it to zil_disable_danger_will_robinson The sad truth is that debugging bits tend to survive into production and then we get escalations that go something like, I set this variable in /etc/system and now I'm {getting data corruption, weird behavior, an odd rash, ...} The fewer the better, imho. If it can be removed, great. If not, then maybe something for the tunables guide. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
Robert Milkowski wrote On 06/21/06 11:09,: Hello Neil, Why is this option available then? (Yes, that's a loaded question.) NP I wouldn't call it an option, but an internal debugging switch that I NP originally added to allow progress when initially integrating the ZIL. NP As Roch says it really shouldn't be ever set (as it does negate POSIX NP synchronous semantics). Nor should it be mentioned to a customer. NP In fact I'm inclined to now remove it - however it does still have a use NP as it helped root cause this problem. Isn't it similar to unsupported fastfs for ufs? It is similar in the sense that it speeds up the file system. Using fastfs can be much more dangerous though as it can lead to a badly corrupted file system as writing meta data is delayed and written out of order. Whereas disabling the ZIL does not affect the integrity of the fs. The transaction group model of ZFS gives consistency in the event of a crash/power fail. However, any data that was promised to be on stable storage may not be unless the transaction group committed (an operation that is started every 5s). We once had plans to add a mount option to allow the admin to control the ZIL. Here's a brief section of the RFE (6280630): sync={deferred,standard,forced} Controls synchronous semantics for the dataset. When set to 'standard' (the default), synchronous operations such as fsync(3C) behave precisely as defined in fcntl.h(3HEAD). When set to 'deferred', requests for synchronous semantics are ignored. However, ZFS still guarantees that ordering is preserved -- that is, consecutive operations reach stable storage in order. (If a thread performs operation A followed by operation B, then the moment that B reaches stable storage, A is guaranteed to be on stable storage as well.) ZFS also guarantees that all operations will be scheduled for write to stable storage within a few seconds, so that an unexpected power loss only takes the last few seconds of change with it. When set to 'forced', all operations become synchronous. No operation will return until all previous operations have been committed to stable storage. This option can be useful if an application is found to depend on synchronous semantics without actually requesting them; otherwise, it will just make everything slow, and is not recommended. Of course we would need to stress the dangers of setting 'deferred'. What do you guys think? Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
AW: [zfs-discuss] Proposal for new basic privileges related with filesystem access checks
Thank you for your hints. I already investigated the zfs/ufs/tmpfs code when I wrote my proposal. When I wrote check if set, I mean doing this with new secpolicy_vnode_* functions. The check for the already existing privileges would of course stay in secpolicy_vnode_owner and secpolicy_vnode_access. The proposed checking order is of course only relevant for permission checks. For changing permissions, I think, the new checking order is rather clear and was not explicitely mentioned: 1. Check whether process is owner of the file, if not go to step 3, else proceed with step 2. 2. Check whether PRIV_FILE_IDENTITY_OWNER as well as all PRIV_FILE_IDENTITY_* privileges that correspond with the new permissions are set. If so, change permissions and return success, else go to step 3. 3. Check whether PRIV_FILE_OWNER is set, if so change permissions and return success, else determine whether we come from step 1 or two and report mssing ownership or missing priviileges to the user. The other owner related checks should be similar. Imagine the situation, when a procss would be able to open a file because it is the owner of the file and the permission bits for the owner grant access. The unmodified code would allow the access, because it won't even call the secpolicy functions. So unfortunately, to my mind, I have to change the file system code and cannot incorporate my privilege checking into the existing sec_policy functions, 8-( Please tell me if I am wrong. Thank you Johannes -Ursprüngliche Nachricht- Von: Mark Shellenbaum [mailto:[EMAIL PROTECTED] Gesendet: Mi 21.06.2006 18:11 An: Nicolai Johannes Cc: zfs-discuss@opensolaris.org; [EMAIL PROTECTED] Betreff: Re: [zfs-discuss] Proposal for new basic privileges related with filesystem access checks Nicolai Johannes wrote: For my Google Summer of Code project for OpenSolaris, my job is to think about new basic privileges. I like to propose five new basic privileges that relate with file system access checks and may be used for daemons like ssh or ssh-agent that (after starting up) never read or write user specific files: PRIV_FILE_IDENTITY_READ: Allow the process to benefit from its supplemental rights associated with its identity (euid, egid and associated groups) during file or directory operations that require read permissions. Additional rights gained through PRIV_FILE_DAC_READ will not be affected. PRIV_FILE_IDENTITY_WRITE: Allow the process to benefit from its supplemental rights associated with its identity (euid, egid and associated groups) during file or directory operations that require write permissions. or ownership of the file. Additional rights gained through PRIV_FILE_DAC_WRITE, PRIV_FILE_OWNER and PRIV_FILE_SETID will not be affected. PRIV_FILE_IDENTITY_OWNER: Allow the process to benefit from its supplemental rights associated with its identity (euid, egid and associated groups) during file or directory operations that require ownership of the file if PRIV_FILE_IDENTITY_READ and PRIV_FILE_IDENTITY_WRITE are set as well. If PRIV_FILE_IDENTITY_READ and PRIV_FILE_IDENTITY_WRITE are not both present, PRIV_FILE_IDENTITY_OWNER will not grant any supplemental rights. Additional rights gained through PRIV_FILE_OWNER and PRIV_FILE_SETID will not be affected. PRIV_FILE_IDENTITY_SEARCH: Allow the process to benefit from its supplemental rights associated with its identity (euid, egid and associated groups) during directory operations that require search permissions. Additional rights gained through PRIV_FILE_DAC_SEARCH will not be affected. PRIV_FILE_IDENTITY_EXECUTE: Allow the process to benefit from its supplemental rights associated with its identity (euid, egid and associated groups) during directory operations that require execute permissions. Additional rights gained through PRIV_FILE_DAC_EXECUTE will not be affected. If this seems to be too much new privileges, one could merge PRIV_FILE_IDENTITY_EXECUTE, PRIV_FILE_IDENTIY_SEARCH and PRIV_FILE_IDENTITY_READ as well as PRIV_FILE_IDENTITY_OWNER and PRIV_FILE_IDENTITY_WRITE together. So dropping the privileges is almost the same as if the process is running under an uid/egid that does not own any file or have permissions for any file. Only almost the same because dropping the privileges won't allow to access files that have permissions associated with them that deny access with the egid/euid. The new privileges willl at least affect open, creat, link, unlink, stat, exec, chmod, chgrp, chown, acl_set, acl_get and opendir. With dropping PRIV_FILE_ALLOW_IDENTITY_OWNER (and having no privilegs like PRIV_FILE_OWNER/SETID), the process won't be able to create files/directories (every file/directory needs to have an owner and its initial permissions may be manipulated), change permissions or owner/group. But it will still be possible to delete files/directories if everyone may
Re: [zfs-discuss] ZFS and Flash archives
I checked into this and got some information from the install group. What I learned is this: the process of creating a flash archive is just a matter of using cpio/pax to make a copy of the contents of an installed system. A flash archive doesn't contain any information about the configuration (i.e. storage partitioning) of a system. It's more like a 'super-package' which contains all of the system software plus some customizations. When you install a flash archive, you need to have already created the storage to hold the archive contents. That's done with the standard Solaris install software (same as a regular initial install, upgrade, jumpstart, or liveupgrade). But the distributed Solaris install software is not yet zfs-aware. So the answer to your question is that you can create a flash archive from this system with zfs filesystems, but until the install software is zfs-aware, you can't use the archive to create a system with zfs pools and datasets. Full support for zfs in flash archives will come with the rest of the zfs installation/boot support. Lori Constantin Gonzalez wrote: Hi, I'm currently setting up a demo machine. It would be nice to set up everything the way I like it, including a number of ZFS filesystems, then create a flash archive, then install from that archive. Will there be any issues with webstart flash and ZFS? Does flar create need to be ZFS aware and if so, is it ZFS aware in S10u2b09a? Best regards, Constantin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
On Wed, 2006-06-21 at 14:15, Neil Perrin wrote: Of course we would need to stress the dangers of setting 'deferred'. What do you guys think? I can think of a use case for deferred: improving the efficiency of a large mega-transaction/batch job such as a nightly build. You create an initially empty or cloned dedicated filesystem for the build, and start it off, and won't look inside until it completes. If the build machine crashes in the middle of the build you're going to nuke it all and start over because that's lower risk than assuming you can pick up where it left off. now, it happens that a bunch of tools used during a build invoke fsync. But in the context of a full nightly build that effort is wasted. All you need is one big sync everything at the very end, either by using a command like sync or lockfs -f, or as a side effect of reverting from sync=deferred to sync=standard. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
Did I miss something on this thread? Was the root cause of the 15-minute fsync every actually determined? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of eric kustarz Sent: Wednesday, June 21, 2006 2:12 PM To: [EMAIL PROTECTED] Cc: zfs-discuss@opensolaris.org; Torrey McMahon Subject: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved Neil Perrin wrote: Robert Milkowski wrote On 06/21/06 11:09,: Hello Neil, Why is this option available then? (Yes, that's a loaded question.) NP I wouldn't call it an option, but an internal debugging switch that I NP originally added to allow progress when initially integrating the ZIL. NP As Roch says it really shouldn't be ever set (as it does negate POSIX NP synchronous semantics). Nor should it be mentioned to a customer. NP In fact I'm inclined to now remove it - however it does still have a use NP as it helped root cause this problem. Isn't it similar to unsupported fastfs for ufs? It is similar in the sense that it speeds up the file system. Using fastfs can be much more dangerous though as it can lead to a badly corrupted file system as writing meta data is delayed and written out of order. Whereas disabling the ZIL does not affect the integrity of the fs. The transaction group model of ZFS gives consistency in the event of a crash/power fail. However, any data that was promised to be on stable storage may not be unless the transaction group committed (an operation that is started every 5s). We once had plans to add a mount option to allow the admin to control the ZIL. Here's a brief section of the RFE (6280630): sync={deferred,standard,forced} Controls synchronous semantics for the dataset. When set to 'standard' (the default), synchronous operations such as fsync(3C) behave precisely as defined in fcntl.h(3HEAD). When set to 'deferred', requests for synchronous semantics are ignored. However, ZFS still guarantees that ordering is preserved -- that is, consecutive operations reach stable storage in order. (If a thread performs operation A followed by operation B, then the moment that B reaches stable storage, A is guaranteed to be on stable storage as well.) ZFS also guarantees that all operations will be scheduled for write to stable storage within a few seconds, so that an unexpected power loss only takes the last few seconds of change with it. When set to 'forced', all operations become synchronous. No operation will return until all previous operations have been committed to stable storage. This option can be useful if an application is found to depend on synchronous semantics without actually requesting them; otherwise, it will just make everything slow, and is not recommended. Of course we would need to stress the dangers of setting 'deferred'. What do you guys think? Neil. Scares me, and it seems we should wait until people are demanding it and we *have* to do it (if that time ever comes) - that is, we can't squeeze any more performance gain out of the 'standard' method. If problems do occur because of 'deferred' mode, once i wrap-up zpool history, we'll have that they set this logged to disk. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Multipathing and ZFS
I have had a brief introduction to ZFS and while discussing it with some other folks the question came up about use with multipathed storage. What, if any, configuration or interaction does ZFS have with a multipathed storage setup - however it may be managed. thanks! Craig Cory Senior Instructor :: ExitCertified : Sun Certified System Administrator : Sun Certified Network Administrator : Sun Certified Security Administrator : Veritas Certified Instructor 8950 Cal Center Drive Bldg 1, Suite 110 Sacramento, California 95826 [e] [EMAIL PROTECTED] [p] 916.669.3970 [f] 916.669.3977 [w] WWW.EXITCERTIFIED.COM +-+ OTTAWA | SACRAMENTO | MONTREAL | LAS VEGAS | QUEBEC CITY | CALGARY SAN FRANCISCO | VANCOUVER | REGINA | WINNIPEG | TORONTO ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Flash archives
Lori Alt wrote: zfs-aware. So the answer to your question is that you can create a flash archive from this system with zfs filesystems, but until the install software is zfs-aware, you can't use the archive to create a system with zfs pools and datasets. yeah that sort of stuff is usually specified in a jumpstart profile (if I remembered the name right) That's where disk partitioning/fs defs normally go, so I would presume zpool stuff should also go there. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
AW: AW: [zfs-discuss] Proposal for new basic privileges related with filesystem access checks
After reading the mails concerning my proposal on the list, I realized the points that were not clear enough in my proposal. First of all, I totally aggree with all your statements, if the new privileges were not basic privileges. All new privileges are basic privileges. So they will be present in most of all cases. They do not grant any additional rights for a process. They are like the exec or fork basic privileges: They only get intersting, if the process does not have them. With dropping the new introduced privileges, the process will give up its identity. That means, all granting permissions that correspond with euid or groups associated with the process does not matter any more. Processes like ssh-agent that do not need their identiity may drop them. An exploit too these processes may not exploit the fact, that the euid/groups of the process allow some file operations that are denied to everyone. Only files that are globally readable/writable/executable may still be accessed (if there is no rule that denies the access for the identity of the process). Many daemons never ever need to read/write files that are only accessible due to the fact that they run under a special euid/egid/belong to special groups or only need these rights in the starting phase and drop them later. Dropping the privileges may also be used to enforce some kind of mandatory access control: An administrator that does not want some users to change permission of files may withdraw PRIV_FILE_IDENTITY_OWNER in their login shell. To enforce these privileges, one have to modify the permission checking order. In the normal case when all basic privilegs are given, nothing will change. I hope that I have made my intentions clearer. Perhaps you may provide me with hints what clauses in my proposal caused all the confusion so that I can correct them. Johannes -Ursprüngliche Nachricht- Von: Mark Shellenbaum [mailto:[EMAIL PROTECTED] Gesendet: Mi 21.06.2006 21:21 An: Nicolai Johannes Cc: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org Betreff: Re: AW: [zfs-discuss] Proposal for new basic privileges related with filesystem access checks Nicolai Johannes wrote: Thank you for your hints. I already investigated the zfs/ufs/tmpfs code when I wrote my proposal. When I wrote check if set, I mean doing this with new secpolicy_vnode_* functions. The check for the already existing privileges would of course stay in secpolicy_vnode_owner and secpolicy_vnode_access. The proposed checking order is of course only relevant for permission checks. For changing permissions, I think, the new checking order is rather clear and was not explicitely mentioned: Just so I am understanding this correctly. All of the PRIV_IDENTITY_* privs look only at euid or egid? Are the below steps for doing a chmod(2)? 1. Check whether process is owner of the file, if not go to step 3, else proceed with step 2. 2. Check whether PRIV_FILE_IDENTITY_OWNER as well as all PRIV_FILE_IDENTITY_* privileges that correspond with the new permissions are set. If so, change permissions and return success, else go to step 3. 3. Check whether PRIV_FILE_OWNER is set, if so change permissions and return success, else determine whether we come from step 1 or two and report mssing ownership or missing priviileges to the user. You can't break the access control rules for ZFS that are enforced by the NFSv4 spec. Typically an ACL for ZFS will be laid out so that the owner will be checked first, but a user could reconstruct an ACL so that isn't true. For UFS the owner will always be checked for first. The rules pretty much have to be: 1. file system checks to see if access should be granted, based on permission bits or file ACL. When a file has an ACL it could be either an additional access control method or an alternate in POSIX terminology. It depends on the file system which it is. 2. If access can't be granted then the file system asks the priv code if it wishes to override denying access. The other owner related checks should be similar. Imagine the situation, when a procss would be able to open a file because it is the owner of the file and the permission bits for the owner grant access. The unmodified code would allow the access, because it won't even call the secpolicy functions. So unfortunately, to my mind, I have to change the file system code and cannot incorporate my privilege checking into the existing sec_policy functions, 8-( If the permission bits or ACL of a file specify that it should be allowed to open a file, then the process should be allowed to open the file. I thought that privileges only granted additional access that would otherwise be denied by a file's permission bits/ACL. This sounds like you want the presence of certain privileges to override permission bits? -Mark ___ zfs-discuss mailing list
Re: [zfs-discuss] ZFS on 32bit x86
Yup, your probably running up against the limitations of 32-bit kernel addressability. We are currently very conservative in this environment, and so tend to end up with a small cache as a result. It may be possible to tweak things to get larger cache sizes, but you run the risk of starving out other processes trying to get memory. -Mark Robert Milkowski wrote: Hello zfs-discuss, Simple test 'ptime find /zfs/filesystem /dev/null' with 2GB RAM. After second, third, etc. time still it reads a lot from disks while find is running (atime is off). on x64 (Opteron) it doesn't. I guess it's due to 512MB heap limit in kernel for its cache. ::memstat shows 469MB for kernel and 1524MB on freelist. Is there anything could be done? I guess not but perhaps ps. of course there're a lot of files like ~150K. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: AW: AW: [zfs-discuss] Proposal for new basic privileges related with filesystem access checks
Nicolai Johannes wrote: After reading the mails concerning my proposal on the list, I realized the points that were not clear enough in my proposal. First of all, I totally aggree with all your statements, if the new privileges were not basic privileges. All new privileges are basic privileges. So they will be present in most of all cases. They do not grant any additional rights for a process. They are like the exec or fork basic privileges: They only get intersting, if the process does not have them. With dropping the new introduced privileges, the process will give up its identity. That means, all granting permissions that correspond with euid or groups associated with the process does not matter any more. Processes like ssh-agent that do not need their identiity may drop them. An exploit too these processes may not exploit the fact, that the euid/groups of the process allow some file operations that are denied to everyone. Only files that are globally readable/writable/executable may still be accessed (if there is no rule that denies the access for the identity of the process). Many daemons never ever need to read/write files that are only accessible due to the fact that they run under a special euid/egid/belong to special groups or only need these rights in the starting phase and drop them later. Dropping the privileges may also be used to enforce some kind of mandatory access control: An administrator that does not want some users to change permission of files may withdraw PRIV_FILE_IDENTITY_OWNER in their login shell. To enforce these privileges, one have to modify the permission checking order. In the normal case when all basic privilegs are given, nothing will change. I hope that I have made my intentions clearer. Perhaps you may provide me with hints what clauses in my proposal caused all the confusion so that I can correct them. Can you give us an example of a 'file' the ssh-agent wishes to open and what the permission are on the file and also what privileges the ssh-agent has, and what the expected results are. You need to be very careful about changing the rules for access control, since you may end up breaking POSIX compliance. -Mark Johannes -Ursprüngliche Nachricht- Von: Mark Shellenbaum [mailto:[EMAIL PROTECTED] Gesendet: Mi 21.06.2006 21:21 An: Nicolai Johannes Cc: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org Betreff: Re: AW: [zfs-discuss] Proposal for new basic privileges related with filesystem access checks Nicolai Johannes wrote: Thank you for your hints. I already investigated the zfs/ufs/tmpfs code when I wrote my proposal. When I wrote check if set, I mean doing this with new secpolicy_vnode_* functions. The check for the already existing privileges would of course stay in secpolicy_vnode_owner and secpolicy_vnode_access. The proposed checking order is of course only relevant for permission checks. For changing permissions, I think, the new checking order is rather clear and was not explicitely mentioned: Just so I am understanding this correctly. All of the PRIV_IDENTITY_* privs look only at euid or egid? Are the below steps for doing a chmod(2)? 1. Check whether process is owner of the file, if not go to step 3, else proceed with step 2. 2. Check whether PRIV_FILE_IDENTITY_OWNER as well as all PRIV_FILE_IDENTITY_* privileges that correspond with the new permissions are set. If so, change permissions and return success, else go to step 3. 3. Check whether PRIV_FILE_OWNER is set, if so change permissions and return success, else determine whether we come from step 1 or two and report mssing ownership or missing priviileges to the user. You can't break the access control rules for ZFS that are enforced by the NFSv4 spec. Typically an ACL for ZFS will be laid out so that the owner will be checked first, but a user could reconstruct an ACL so that isn't true. For UFS the owner will always be checked for first. The rules pretty much have to be: 1. file system checks to see if access should be granted, based on permission bits or file ACL. When a file has an ACL it could be either an additional access control method or an alternate in POSIX terminology. It depends on the file system which it is. 2. If access can't be granted then the file system asks the priv code if it wishes to override denying access. The other owner related checks should be similar. Imagine the situation, when a procss would be able to open a file because it is the owner of the file and the permission bits for the owner grant access. The unmodified code would allow the access, because it won't even call the secpolicy functions. So unfortunately, to my mind, I have to change the file system code and cannot incorporate my privilege checking into the existing sec_policy functions, 8-( If the permission bits or ACL of a file specify that it should be allowed to open a file, then the process should be allowed to open
[zfs-discuss] Let's get cooking...
http://www.tech-recipes.com/solaris_system_administration_tips1446.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: AW: AW: [zfs-discuss] Proposal for new basic privileges related with filesystem access checks
Processes like ssh-agent that do not need their identiity may drop = them. An exploit too these processes may not exploit the fact, that t= he euid/groups of the process allow some file operations that are den= ied to everyone. Only files that are globally readable/writable/execu= table may still be accessed (if there is no rule that denies the acce= ss for the identity of the process). Many daemons never ever need to = read/write files that are only accessible due to the fact that they r= un under a special euid/egid/belong to special groups or only need th= ese rights in the starting phase and drop them later. Dropping the privileges may also be used to enforce some kind of mand= atory access control: An administrator that does not want some users = to change permission of files may withdraw PRIV_FILE_IDENTITY_OWNER i= n their login shell. I'm not sure if I like the name, then; nor the emphasis on the euid/egid (as those terms are not commonly used in the kernel; there's a reason why the effective uid was cr-cr_uid and not cr_euid. In other words, what your are doing is creating a nobody user with an ordinary user id. In that case, the fact of having five different privileges to shadow the five FILE privileges is perhaps going overboard. It's also perhaps more easily understood when referred to in the frame of reference of an anonymous user. There are also some other strange corner cases; e.g., opening files in /tmp with a umask other than 0. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: AW: AW: [zfs-discuss] Proposal for new basic privileges related with filesystem access checks
On Thu, Jun 22, 2006 at 01:01:38AM +0200, [EMAIL PROTECTED] wrote: I'm not sure if I like the name, then; nor the emphasis on the euid/egid (as those terms are not commonly used in the kernel; there's a reason why the effective uid was cr-cr_uid and not cr_euid. In other words, what your are doing is creating a nobody user with an ordinary user id. Yes. It's kind of enticing. In that case, the fact of having five different privileges to shadow the five FILE privileges is perhaps going overboard. It's also perhaps more easily understood when referred to in the frame of reference of an anonymous user. There are also some other strange corner cases; e.g., opening files in /tmp with a umask other than 0. As I interpret the proposal file creation in /tmp would succeed, but opening existing files owned by the process' actual euid cannot be opened if thes basic privs are dropped. How would dropping this basic priv work with NFS though? Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Flash archives
On 6/21/06, Lori Alt [EMAIL PROTECTED] wrote: I checked into this and got some information from the install group. What I learned is this: the process of creating a flash archive is just a matter of using cpio/pax to make a copy of the contents of an installed system. A flash archive doesn't contain any information about the configuration (i.e. storage partitioning) of a system. It's more like a 'super-package' which contains all of the system software plus some customizations. When you install a flash archive, you need to have already created the storage to hold the archive contents. That's done with the standard Solaris install software (same as a regular initial install, upgrade, jumpstart, or liveupgrade). But the distributed Solaris install software is not yet zfs-aware. So the answer to your question is that you can create a flash archive from this system with zfs filesystems, but until the install software is zfs-aware, you can't use the archive to create a system with zfs pools and datasets. Full support for zfs in flash archives will come with the rest of the zfs installation/boot support. But flash archives come with multiple sections. From flash_archive(4) on Solaris 9: The flash archive is laid out in the following sections: o archive cookie o archive identification o manifest (for differential archives only) o predeployment o postdeployment o reboot o summary o user-defined (optional) o archive files It seems as though if suitably motivated, additional information about the desired configuration could be stored in one of the above sections, either directly or as a result of scripts (e.g. derived profiles in jumpstart). Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: AW: AW: [zfs-discuss] Proposal for new basic privileges related with filesystem access checks
On Thu, Jun 22, 2006 at 02:45:50AM +0200, Nicolai Johannes wrote: Spo as I have understood you, explaining the new privileges with the term anonymous user would be better? I actually thought about that idea, but there is a subtle difference: Hmmm, no I have no good name for it. Concerning the discussion whether five privileges are too much for the purpose: My proposal also asks the question whether one should merge the five privileges into two ones (PRIV_FILE_IDENTITY_{READ|WRITE} with the semantic of state perserving/non state preserving operations. To my mind, this is too coarsely grained in order to programmatically restrict the power of a privileged/unprivileged process. Furthermore, the shadowing of the existing privileges would be sematically more consistent. The administrative effort to create nobody users, set s-bits for special programs and track the usage of this user would also vanish. Yeah, I think 5 is probably too many. Wouldn't apps that drop them drop all of them? To the NFS/POSIX issue: I am not an expert in this field, but I believe that the following two assumptions are right (correct me if I am wrong): 1. Because the presence of all new basic privileges would change anything in the established behaviour (check all three checking possibilities if in doubt), programs with basic privileges (almost all) will not notice the new privs at all. Right. 2. I do not know exactly if permissions for a NFS file are checked on server or client or both (I assume at least at the client). The new privileges are only checked at the client, so the server is not affected at all. In any case, having set/dropped the new privileges, the process will be able to access files, it would normally (i.e. without having introduced the new privileges) not be able to. The server checks permissions. IIRC Least Privilege already has some problems with NFS, namely that asserting PRIV_FILE_DAC_* over NFS does not work (unless the euid is 0). There are things that can be done about that problem strictly within the NFSv4 protocol, but for not asserting basic file privileges? I don't yet know what can be done within the protocol... ...though in the past I've proposed a stackable GSS-API mechanism or Kerberos V authorization-data element to convey client-side privilege information to the server for evaluation there. But this would be well beyond the scope of your project -- I mention it only for completeness. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
Bill Sommerfeld wrote: On Wed, 2006-06-21 at 14:15, Neil Perrin wrote: Of course we would need to stress the dangers of setting 'deferred'. What do you guys think? I can think of a use case for deferred: improving the efficiency of a large mega-transaction/batch job such as a nightly build. You create an initially empty or cloned dedicated filesystem for the build, and start it off, and won't look inside until it completes. If the build machine crashes in the middle of the build you're going to nuke it all and start over because that's lower risk than assuming you can pick up where it left off. now, it happens that a bunch of tools used during a build invoke fsync. But in the context of a full nightly build that effort is wasted. All you need is one big sync everything at the very end, either by using a command like sync or lockfs -f, or as a side effect of reverting from sync=deferred to sync=standard. Can I give support for this use case? Or does it take someone like Casper Dik with 'fastfs' to come along later and provide a utility that lets people make the filesystem do what want it to? [still annoyed that it took me so long to find out about fastfs - hell, the Solaris 8 or 9 OS installation process was using the same IOCTL as fastfs uses, but for some reason end users still have to find fastfs out on the Net somewhere instead of getting it with the OS]. If the ZFS docs state why it's not for general use, then what's to separate this from the zillion other ways that a cavalier sysadmin can bork their data (or indeed their whole machine)? Otherwise, why even let people create a striped zpool vdev without redundancy - it's just an accident waiting to happen, right? We must save people from themselves! Think of the children! ;-) -Jason =:^/ -- [EMAIL PROTECTED] ANU Supercomputer Facility APAC Grid ProgramLeonard Huxley Bldg 56, Mills Road Ph: +61 2 6125 5449 Australian National University Fax: +61 2 6125 8199 Canberra, ACT, 0200, Australia ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS questions (hybrid HDs)
Anton B. Rang wrote: Actually, while Seagate's little white paper doesn't explicitly say so, the FLASH is used for a write cache and that provides one of the major benefits: Writes to the disk rarely need to spin up the motor. Probably 90+% of all writes to disk will fit into the cache in a typical laptop environment (no, compiling OpenSolaris isn't typical usage…). On OpenSolaris laptops with enough RAM, we need to think about fitting mappings of libc, cron and all of its work into the buffer cache and then maybe the flash cache on the drive. Each time you execute a program, that's an atime uptdate of its file... I've known people to wear out laptop hard drives in a frighteningly short period of time because of the drive being spun up and down to service cron, sendmail queue runs, syslog messages... Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss