Re: [Patch 09/18] fs/logfs/gc.c
On Sun, Jun 03, 2007 at 08:46:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: --- /dev/null 2007-03-13 19:15:28.862769062 +0100 +++ linux-2.6.21logfs/fs/logfs/gc.c 2007-06-03 19:18:57.0 +0200 Number of bugs in case of error looks quite sad... -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 07/18] fs/logfs/dir.c
On Sun, Jun 03, 2007 at 08:44:29PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: --- /dev/null 2007-03-13 19:15:28.862769062 +0100 +++ linux-2.6.21logfs/fs/logfs/dir.c 2007-06-03 19:54:55.0 +0200 ... +static int __logfs_dir_walk(struct inode *dir, struct dentry *dentry, + dir_callback handler, struct logfs_disk_dentry *dd, loff_t *pos) +{ + struct qstr *name = dentry ? dentry-d_name : NULL; + int ret; + + for (; ; (*pos)++) { + ret = read_dir(dir, dd, *pos); + if (ret == -EOF) + return 0; + if (ret == -ENODATA) { + /* deleted dentry */ + *pos = dir_seek_data(dir, *pos); + continue; + } + if (ret) + return ret; + BUG_ON(dd-namelen == 0); This can be moved out of the loop or even to the higher layer where this one is called. There is number of such debug stuff in the tree. ... +static int logfs_lookup_handler(struct inode *dir, struct dentry *dentry, + struct logfs_disk_dentry *dd, loff_t pos) +{ + struct inode *inode; + + inode = iget(dir-i_sb, be64_to_cpu(dd-ino)); + if (!inode) + return -EIO; + return PTR_ERR(d_splice_alias(inode, dentry)); +} From perfectionism point of view it should return long not int, but frankly it is so minor, that even does not costs time I spent writing this sentence. ^W^W^W +static int __logfs_readdir(struct file *file, void *buf, filldir_t filldir) +{ + struct logfs_disk_dentry dd; + struct inode *dir = file-f_dentry-d_inode; + loff_t pos = file-f_pos - IMPLICIT_NODES; + int err; + + BUG_ON(pos0); Spaces run away. +static void logfs_set_name(struct logfs_disk_dentry *dd, struct qstr *name) +{ + BUG_ON(name-len LOGFS_MAX_NAMELEN); Hmmm, I would write here that user is damn wrong and his DNA is not interested for the humanity gene pool instead of crashing machine. + dd-namelen = cpu_to_be16(name-len); + memcpy(dd-name, name-name, name-len); +} +} +static int logfs_symlink(struct inode *dir, struct dentry *dentry, + const char *target) +{ + struct inode *inode; + size_t destlen = strlen(target) + 1; + + if (destlen dir-i_sb-s_blocksize) + return -ENAMETOOLONG; Should it also include related to name overhead, or name is just placed into datablock as is? + inode = logfs_new_inode(dir, S_IFLNK | S_IRWXUGO); + if (IS_ERR(inode)) + return PTR_ERR(inode); + + inode-i_op = logfs_symlink_iops; + inode-i_mapping-a_ops = logfs_reg_aops; + + return __logfs_create(dir, dentry, inode, target, destlen); +} +static int logfs_delete_dd(struct inode *dir, struct logfs_disk_dentry *dd, + loff_t pos) +{ + int err; + + err = read_dir(dir, dd, pos); + + /* + * Getting called with pos somewhere beyond eof is either a goofup + * within this file or means someone maliciously edited the + * (crc-protected) journal. + */ + LOGFS_BUG_ON(err == -EOF, dir-i_sb); Maybe just return permanent error, remount itself read-only and say something insulting instead of killing itself in pain? + if (err) + return err; + + dir-i_ctime = dir-i_mtime = CURRENT_TIME; + if (dd-type == DT_DIR) + dir-i_nlink--; + return logfs_delete(dir, pos); +} +static int logfs_rename_target(struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry) +{ + struct logfs_super *super = logfs_super(old_dir-i_sb); + struct inode *old_inode = old_dentry-d_inode; + struct inode *new_inode = new_dentry-d_inode; + int isdir = S_ISDIR(old_inode-i_mode); + struct logfs_disk_dentry dd; + loff_t pos; + int err; + + BUG_ON(isdir != S_ISDIR(new_inode-i_mode)); Spaces run away. + if (isdir) { + if (!logfs_empty_dir(new_inode)) + return -ENOTEMPTY; + } One can save two lines of code if put both logical chek in on if (). +int logfs_replay_journal(struct super_block *sb) +{ + struct logfs_super *super = logfs_super(sb); + struct logfs_disk_dentry dd; + struct inode *inode; + u64 ino, pos; + int err; + + if (super-s_victim_ino) { + /* delete victim inode */ + ino = super-s_victim_ino; + inode = iget(sb, ino); + if (!inode) + goto fail; + + super-s_victim_ino = 0; + err = logfs_remove_inode(inode); + iput(inode); + if (err) { + super-s_victim_ino = ino; + goto fail; + } + } + if (super-s_rename_dir) { + /* delete old dd from rename */ + ino =
Re: LogFS take four
On Sun, Jun 03, 2007 at 08:38:46PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: This round the patch is split into file-sized hunks. There actually seem to be kernel developers not manly enough to digest 6000+ lines of code at once. An I thought I was the only wimp around. Again, anyone giving comments in the last round is on Cc:. I'll try to respond to comments but the next round of patches may take a while longer, due to other responsibilities. Hi Jorn. Sorry for late reply (and wrong non-utf latter in the name :). I have couple of minor nits I will answer another mails, but in general I think it should be included in -mm so that people could start using it report real bugs, but not handwaving about possible problems. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 09/18] fs/logfs/gc.c
On Fri, 15 June 2007 13:03:57 +0400, Evgeniy Polyakov wrote: On Sun, Jun 03, 2007 at 08:46:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: --- /dev/null 2007-03-13 19:15:28.862769062 +0100 +++ linux-2.6.21logfs/fs/logfs/gc.c 2007-06-03 19:18:57.0 +0200 Number of bugs in case of error looks quite sad... Agreed. I've started working on error handling. Most erase errors are dealt with. Write errors still need some infrastructure. If you like I can send another round of patches for review. Jörn -- Joern's library part 12: http://physics.nist.gov/cuu/Units/binary.html - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LogFS take four
On Fri, 15 June 2007 12:37:32 +0400, Evgeniy Polyakov wrote: On Sun, Jun 03, 2007 at 08:38:46PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: This round the patch is split into file-sized hunks. There actually seem to be kernel developers not manly enough to digest 6000+ lines of code at once. An I thought I was the only wimp around. Again, anyone giving comments in the last round is on Cc:. I'll try to respond to comments but the next round of patches may take a while longer, due to other responsibilities. Hi Jorn. Sorry for late reply (and wrong non-utf latter in the name :). I have been called worse. :) I have couple of minor nits I will answer another mails, but in general I think it should be included in -mm so that people could start using it report real bugs, but not handwaving about possible problems. Thank you for the confidence. Jörn -- Mundie uses a textbook tactic of manipulation: start with some reasonable talk, and lead the audience to an unreasonable conclusion. -- Bruce Perens - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 09/18] fs/logfs/gc.c
On Fri, Jun 15, 2007 at 01:14:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: On Fri, 15 June 2007 13:03:57 +0400, Evgeniy Polyakov wrote: On Sun, Jun 03, 2007 at 08:46:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote: --- /dev/null 2007-03-13 19:15:28.862769062 +0100 +++ linux-2.6.21logfs/fs/logfs/gc.c 2007-06-03 19:18:57.0 +0200 Number of bugs in case of error looks quite sad... Agreed. I've started working on error handling. Most erase errors are dealt with. Write errors still need some infrastructure. If you like I can send another round of patches for review. Yep, send them, when thinks they are ready. Jörn -- Joern's library part 12: http://physics.nist.gov/cuu/Units/binary.html -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS
Chris Mason wrote: On Thu, Jun 14, 2007 at 02:20:26PM -0400, Chuck Lever wrote: NetApp happens to use the standard NDMP protocol for sending the flattened file system. NetApp uses it for synchronous replication, volume migration, and back up to nearline storage and tape. AFS used vol dump and vol restore for migration, replication, and back-up. ZFS has the zfs send and zfs receive commands that do basically the same (Eric Kustarz recently published a blog entry that described how these work). And of course, all file system objects are able to be sent this way: streams, xattrs, ACLs, and so on are all supported. Note also that NFSv4 supports the idea of migrated or replicated file objects. All that is needed to support it is a mechanism on the servers to actually move the data. Stringing the replication together with the underlying FS would be neat. Is there a way to deal with a master/slave setup, where the slave may be out of date? Among the implementations I'm aware of, there is a varying degree of integration into the physical file system. In general, it depends on how far out of date the slave is, and how closely the slave is supposed to be synchronized to the master. A hot backup file system, for example, should be data-consistent within a few seconds of the master. A snapshot is used to initialize a slave, followed by a live stream of updates to the master being sent to slaves. Such a mechanism already exists on NetApp filers because they gather changes in NVRAM before committing them to the local file system. Simply put, these changes can also be bundled and sent to a local hot backup filer that is attached via Infiniband, or over the network to a remote hot backup filer. For AFS, replication is done by maintaining a rw and ro copy of a volume on the designated master server. Changes are made to the rw copy over time. When admins want to push out a new version to replicas on another server, the ro copy on the master is replaced with a new snapshot, then this is pushed to the slaves. The replicas are always ro and are used mostly for load balancing; clients contact the closest or fastest server containing a replica of the volume they want to access. They always have a complete copy of the volume (ie no COW on the slaves). I think you have designed into btrfs a lot of opportunity to implement this kind of data virtualization and management... I'm excited to see what can be done. begin:vcard fn:Chuck Lever n:Lever;Chuck org:Oracle Corporation;Corporate Architecture: Linux Projects Group adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA title:Principal Member of Staff tel;work:+1 248 614 5091 x-mozilla-html:FALSE url:http://oss.oracle.com/~cel/ version:2.1 end:vcard
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Sun, Jun 10, 2007 at 10:09:18AM -0700, Crispin Cowan wrote: Andreas Gruenbacher wrote: On Saturday 09 June 2007 02:17, Greg KH wrote: On Sat, Jun 09, 2007 at 12:03:57AM +0200, Andreas Gruenbacher wrote: AppArmor is meant to be relatively easy to understand, manage, and customize, and introducing a labels layer wouldn't help these goals. Woah, that describes the userspace side of AA just fine, it means nothing when it comes to the in-kernel implementation. There is no reason that you can't implement the same functionality using some totally different in-kernel solution if possible. I agree that the in-kernel implementation could use different abstractions than user-space, provided that the underlying implementation details can be hidden well enough. The key phrase here is if possible, and in fact if possible is much too strong: very many things in software are possible, including user-space drives and a stable kernel module ABI. Some things make sense; others are genuinely bad ideas while still possible. In particular, to layer AppArmor on top of SELinux, the following problems must be addressed: * New files: when a file is created, it is labeled according to the type of the creating process and the type of the parent directory. Applications can also use libselinux to use application logic to relabel the file, but that is not 'mandatory' policy, and fails in cases like cp and mv. AppArmor lets you create a policy that e..g says /home/*/.plan r to permit fingerd to read everyone's .plan file, should it ever exist, and you cannot emulate that with SELinux. A daemon using inotify can instantly[1] detect this and label the file properly if it shows up. * Renamed Files: Renaming a file changes the policy with respect to that file in AA. To emulate this in SELinux, you would have to have a way to instantly re-label the file upon rename. Same daemon can do the re-label. * Renamed Directory trees: The above problem is compounded with directory trees. Changing the name at the top of a large, bushy tree can require instant relabeling of millions of files. Same daemon can do this. And yes, it might take a ammount of time, but the times that this happens in real-life on a production server is quite small, if at all. * New Policies: The SEEdit approach of compiling AA profiles into SELinux labels involves computing the partition set of files, so that each element of the partition set is unique, and corresponds to all the policies that treat every file in the element identically. If you create a new profile that touches *some* of the files in such an element, then you have to split that synthetic label, re-compute the partition set, and re-label the file system. Again, same daemon can handle this logic. * File Systems That Do Not Support Labels: The most important being NFS3 and FAT. Because they do not support labels at all, SELinux has to give you an all-or-nothing access control on the entire remote volume. AA can give you nuanced access control in these file systems. SELinux already provides support for the whole mounted filesystem, which, in real-life testing, seems to be quite sufficient. Also, the SELinux developers are working on some changes to make this a bit more fine-grained. See also Stephan's previous comments about NFSv3 client directories and multiple views having the potential to cause a lot of havoc. You could support all of these features in SELinux, but only by adding an in-kernel file matching mechanism similar to AppArmor. I don't think that is necessary at all, see above for why. It would basically load an AppArmor policy into the kernel, label files as they are brought from disk into the cache, and then use SELinux to do the access controls. No, do the labeling in userspace with a daemon using inotify to handle the changing of the files around. Or has this whole idea of a daemon been disproved already with a prototype somewhere that failed? If not, a simple test app would not be that hard to hack up. Maybe I'll see if I can do it during the week of June 24 :) thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, 2007-06-15 at 11:01 -0700, Casey Schaufler wrote: --- Greg KH [EMAIL PROTECTED] wrote: A daemon using inotify can instantly[1] detect this and label the file properly if it shows up. In our 1995 B1 evaluation of Trusted Irix we were told in no uncertain terms that such a solution was not acceptable under the TCSEC requirements. Detection and relabel on an unlocked object creates an obvious window for exploitation. We were told that such a scheme would be considered a design flaw. I understand that some of the Common Criteria labs are less aggressive regarding chasing down these issues than the NCSC teams were. It might not prevent an evaluation from completing today. It is still hard to explain why it's ok to have a file that's labeled incorrectly _even briefly_. It is the systems job to ensure that that does not happen. Um, Casey, he is talking about how to emulate AppArmor behavior on a label-based system like SELinux, not meeting B1 or LSPP or anything like that (which AppArmor can't do regardless). As far as general issue goes, if your policy is configured such that the new file gets the most restrictive label possible at creation time and then the daemon relabels it to a less restrictive label if appropriate, then there is no actual window of exposure. Also, there is such a daemon, restorecond, in SELinux (policycoreutils) although we avoid relying on it for anything security-critical naturally. And one could introduce the named type transition concept that has been discussed in this thread without much difficulty to selinux. -- Stephen Smalley National Security Agency - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
--- Greg KH [EMAIL PROTECTED] wrote: A daemon using inotify can instantly[1] detect this and label the file properly if it shows up. In our 1995 B1 evaluation of Trusted Irix we were told in no uncertain terms that such a solution was not acceptable under the TCSEC requirements. Detection and relabel on an unlocked object creates an obvious window for exploitation. We were told that such a scheme would be considered a design flaw. I understand that some of the Common Criteria labs are less aggressive regarding chasing down these issues than the NCSC teams were. It might not prevent an evaluation from completing today. It is still hard to explain why it's ok to have a file that's labeled incorrectly _even briefly_. It is the systems job to ensure that that does not happen. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS
Chris Mason wrote: is it possible to test it on top of LVM2 on RAID at this stage? Yes, I haven't done much multi-spindle testing yet, so I'm definitely interested in these numbers. -chris I did not get very far: # insmod btrfs.ko # mkfs.btrfs /dev/brain_volume_group/btrfstest on close 0 blocks are allocated fs created on /dev/brain_volume_group/btrfstest blocksize 4096 blocks 4980736 (/dev/brain_volume_group/btrfstest is a 20GB logical volume on top of RAID6) # mount /dev/brain_volume_group/btrfstest /mnt/temp/ (this gives these kernel-msgs: [ 385.980358] btrfs: dm-6 checksum verify failed on 4 [ 385.980462] btrfs: dm-6 checksum verify failed on 12 [ 385.980559] btrfs: dm-6 checksum verify failed on 11 ) # touch /mnt/temp/default/testfile.txt [ 445.445638] btrfs: dm-6 checksum verify failed on 10 # umount /mnt/temp/ [ 457.980372] [ cut here ] [ 457.980377] kernel BUG at fs/buffer.c:2644! [ 457.980379] invalid opcode: [1] PREEMPT [ 457.980382] CPU 0 [ 457.980384] Modules linked in: btrfs snd_seq_midi cx88_dvb cx88_vp3054_i2c video_buf_dvb snd_ice1712 snd_ice17xx_ak4xxx snd_ak4xxx_adda snd_cs8427 snd_ac97_codec ac97_bus snd_i2c snd_mpu401_uart snd_rawmidi cx8800 cx8802 cx88xx ir_common tveeprom btcx_risc video_buf uhci_hcd [ 457.980397] Pid: 6040, comm: btrfs/0 Not tainted 2.6.21.5 #50 [ 457.980400] RIP: 0010:[8021996c] [8021996c] submit_bh+0xf/0x102 [ 457.980408] RSP: 0018:81000bab7d30 EFLAGS: 00010246 [ 457.980411] RAX: a829 RBX: 81000ac207b0 RCX: 81005f0458c8 [ 457.980414] RDX: 0033 RSI: 81000ac207b0 RDI: 0001 [ 457.980418] RBP: 0001 R08: 81000ccdd3f8 R09: 81005fe78d50 [ 457.980422] R10: 025fffe0 R11: 802407c7 R12: [ 457.980426] R13: 81001c16f480 R14: 81000ccdd3f8 R15: 81000bab7d88 [ 457.980430] FS: 2b7554d54050() GS:80728000() knlGS:f7e3f6b0 [ 457.980434] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [ 457.980437] CR2: 2abaf000 CR3: 03b06000 CR4: 06e0 [ 457.980441] Process btrfs/0 (pid: 6040, threadinfo 81000bab6000, task 81005dba8480) [ 457.980443] Stack: 81000ac207b0 81001c16f480 880988bb [ 457.980450] 81001c16f480 81000bab7d80 81000fee76e0 88099eb1 [ 457.980455] 0001 81001b318c10 81001c16f180 0050 [ 457.980459] Call Trace: [ 457.980471] [880988bb] :btrfs:write_ctree_super+0xd3/0x11f [ 457.980480] [88099eb1] :btrfs:btrfs_commit_transaction+0x43e/0x5c0 [ 457.980486] [80257e4b] cache_alloc_refill+0x2a3/0x4f7 [ 457.980491] [802873fb] autoremove_wake_function+0x0/0x2e [ 457.980501] [8809a033] :btrfs:btrfs_transaction_cleaner+0x0/0x141 [ 457.980510] [8809a0e0] :btrfs:btrfs_transaction_cleaner+0xad/0x141 [ 457.980515] [8024869c] run_workqueue+0xb5/0x18e [ 457.980519] [80245499] worker_thread+0x0/0x145 [ 457.980523] [80287256] keventd_create_kthread+0x0/0x89 [ 457.980526] [802455a8] worker_thread+0x10f/0x145 [ 457.980531] [80277d4f] default_wake_function+0x0/0xe [ 457.980535] [80287256] keventd_create_kthread+0x0/0x89 [ 457.980540] [802302cb] kthread+0xca/0xfb [ 457.980545] [80259318] child_rip+0xa/0x12 [ 457.980549] [80287256] keventd_create_kthread+0x0/0x89 [ 457.980555] [80230201] kthread+0x0/0xfb [ 457.980558] [8025930e] child_rip+0x0/0x12 [ 457.980561] [ 457.980562] [ 457.980563] Code: 0f 0b eb fe 8b 06 a8 20 75 04 0f 0b eb fe 48 83 7e 38 00 75 [ 457.980571] RIP [8021996c] submit_bh+0xf/0x102 [ 457.980576] RSP 81000bab7d30 Linux localhost 2.6.21.5 #51 Fri Jun 15 20:53:36 CEST 2007 x86_64 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS
On Fri, Jun 15, 2007 at 09:08:38PM +0200, Florian D. wrote: Chris Mason wrote: is it possible to test it on top of LVM2 on RAID at this stage? Yes, I haven't done much multi-spindle testing yet, so I'm definitely interested in these numbers. -chris I did not get very far: # insmod btrfs.ko # mkfs.btrfs /dev/brain_volume_group/btrfstest on close 0 blocks are allocated fs created on /dev/brain_volume_group/btrfstest blocksize 4096 blocks 4980736 (/dev/brain_volume_group/btrfstest is a 20GB logical volume on top of RAID6) # mount /dev/brain_volume_group/btrfstest /mnt/temp/ (this gives these kernel-msgs: [ 385.980358] btrfs: dm-6 checksum verify failed on 4 [ 385.980462] btrfs: dm-6 checksum verify failed on 12 [ 385.980559] btrfs: dm-6 checksum verify failed on 11 These are normal on the first mount, the mkfs doesn't set the csums on the blocks it creates (will fix ;) ) # touch /mnt/temp/default/testfile.txt [ 445.445638] btrfs: dm-6 checksum verify failed on 10 # umount /mnt/temp/ [ 457.980372] [ cut here ] [ 457.980377] kernel BUG at fs/buffer.c:2644! Whoops. Please try this: diff -r 38b36731 disk-io.c --- a/disk-io.c Fri Jun 15 13:50:20 2007 -0400 +++ b/disk-io.c Fri Jun 15 15:12:26 2007 -0400 @@ -541,6 +541,7 @@ int write_ctree_super(struct btrfs_trans else ret = submit_bh(WRITE, bh); if (ret == -EOPNOTSUPP) { + lock_buffer(bh); set_buffer_uptodate(bh); root-fs_info-do_barriers = 0; ret = submit_bh(WRITE, bh); - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
Hi! And before you scream races, take a look. It does not actually add them: I agree that the in-kernel implementation could use different abstractions than user-space, provided that the underlying implementation details can be hidden well enough. The key phrase here is if possible, and in fact if possible is much too strong: very many things in software are possible, including user-space drives and a stable kernel module ABI. Some things make sense; others are genuinely bad ideas while still possible. In particular, to layer AppArmor on top of SELinux, the following problems must be addressed: * New files: when a file is created, it is labeled according to the type of the creating process and the type of the parent directory. Applications can also use libselinux to use application logic to relabel the file, but that is not 'mandatory' policy, and fails in cases like cp and mv. AppArmor lets you create a policy that e..g says /home/*/.plan r to permit fingerd to read everyone's .plan file, should it ever exist, and you cannot emulate that with SELinux. A daemon using inotify can instantly[1] detect this and label the file properly if it shows up. Or just create the files with restrictive labels by default. That way you fail closed. * Renamed Files: Renaming a file changes the policy with respect to that file in AA. To emulate this in SELinux, you would have to have a way to instantly re-label the file upon rename. Same daemon can do the re-label. ...and no, race there is not important. Attacker may have opened the file under old name and is keeping open file descriptor. So this does not add a new race relative to AA. * Renamed Directory trees: The above problem is compounded with directory trees. Changing the name at the top of a large, bushy tree can require instant relabeling of millions of files. Same daemon can do this. And yes, it might take a ammount of time, but the times that this happens in real-life on a production server is quite small, if at all. And now, if you move a tree, there will be old labels for a while. But this does not matter, because attacker could be keeping file descriptors. Only case where attacker _can't_ be keeping file descriptors is newly created files in recently moved tree. But as you already create files with restrictive permissions, that's okay. Yes, you may get some -EPERM during the tree move, but AA has that problem already, see that when madly moving trees we sometimes construct path file never ever had. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS
Chris Mason wrote: # umount /mnt/temp/ [ 457.980372] [ cut here ] [ 457.980377] kernel BUG at fs/buffer.c:2644! Whoops. Please try this: diff -r 38b36731 disk-io.c --- a/disk-io.c Fri Jun 15 13:50:20 2007 -0400 +++ b/disk-io.c Fri Jun 15 15:12:26 2007 -0400 @@ -541,6 +541,7 @@ int write_ctree_super(struct btrfs_trans else ret = submit_bh(WRITE, bh); if (ret == -EOPNOTSUPP) { + lock_buffer(bh); set_buffer_uptodate(bh); root-fs_info-do_barriers = 0; ret = submit_bh(WRITE, bh); sorry, with the patch applied: [ 147.475077] BUG: at /home/florian/system/btrfs_test/btrfs-0.2/disk-io.c:534 write_ctree_super() [ 147.475082] [ 147.475083] Call Trace: [ 147.475096] [880957f7] :btrfs:write_ctree_super+0x70/0x140 [ 147.475106] [88096ec5] :btrfs:btrfs_commit_transaction+0x43e/0x5c0 [ 147.475112] [8022a2a6] __writeback_single_inode+0x34f/0x361 [ 147.475121] [88096fec] :btrfs:btrfs_commit_transaction+0x565/0x5c0 [ 147.475126] [8027b4eb] autoremove_wake_function+0x0/0x2e [ 147.475136] [88095915] :btrfs:close_ctree+0x4e/0x191 [ 147.475141] [8022e22e] dispose_list+0xad/0xc9 [ 147.475146] [8029fd1a] invalidate_inodes+0xc3/0xd5 [ 147.475155] [8808d170] :btrfs:btrfs_put_super+0x10/0x31 [ 147.475159] [80299849] generic_shutdown_super+0x5b/0xd2 [ 147.475163] [802998e6] kill_block_super+0x26/0x3b [ 147.475167] [80299971] deactivate_super+0x3d/0x55 [ 147.475172] [802a0e4b] sys_umount+0x1ca/0x1f1 [ 147.475177] [8021fd18] sys_newstat+0x19/0x31 [ 147.475184] [80250d5e] system_call+0x7e/0x83 [ 147.475188] [ 147.476020] BUG: at /home/florian/system/btrfs_test/btrfs-0.2/disk-io.c:534 write_ctree_super() [ 147.476023] [ 147.476024] Call Trace: [ 147.476033] [880957f7] :btrfs:write_ctree_super+0x70/0x140 [ 147.476042] [88096ec5] :btrfs:btrfs_commit_transaction+0x43e/0x5c0 [ 147.476048] [8022a2a6] __writeback_single_inode+0x34f/0x361 [ 147.476057] [88096fec] :btrfs:btrfs_commit_transaction+0x565/0x5c0 [ 147.476061] [8027b4eb] autoremove_wake_function+0x0/0x2e [ 147.476066] [802554d9] mutex_lock+0xd/0x1d [ 147.476075] [8809592d] :btrfs:close_ctree+0x66/0x191 [ 147.476080] [8022e22e] dispose_list+0xad/0xc9 [ 147.476085] [8029fd1a] invalidate_inodes+0xc3/0xd5 [ 147.476096] [8808d170] :btrfs:btrfs_put_super+0x10/0x31 [ 147.476100] [80299849] generic_shutdown_super+0x5b/0xd2 [ 147.476104] [802998e6] kill_block_super+0x26/0x3b [ 147.476108] [80299971] deactivate_super+0x3d/0x55 [ 147.476112] [802a0e4b] sys_umount+0x1ca/0x1f1 [ 147.476118] [8021fd18] sys_newstat+0x19/0x31 [ 147.476124] [80250d5e] system_call+0x7e/0x83 [ 147.476128] [ 147.482579] BUG: at /home/florian/system/btrfs_test/btrfs-0.2/disk-io.c:534 write_ctree_super() [ 147.482582] [ 147.482583] Call Trace: [ 147.482592] [880957f7] :btrfs:write_ctree_super+0x70/0x140 [ 147.482601] [88095949] :btrfs:close_ctree+0x82/0x191 [ 147.482605] [8022e22e] dispose_list+0xad/0xc9 [ 147.482611] [8029fd1a] invalidate_inodes+0xc3/0xd5 [ 147.482619] [8808d170] :btrfs:btrfs_put_super+0x10/0x31 [ 147.482623] [80299849] generic_shutdown_super+0x5b/0xd2 [ 147.482627] [802998e6] kill_block_super+0x26/0x3b [ 147.482631] [80299971] deactivate_super+0x3d/0x55 [ 147.482636] [802a0e4b] sys_umount+0x1ca/0x1f1 [ 147.482641] [8021fd18] sys_newstat+0x19/0x31 [ 147.482648] [80250d5e] system_call+0x7e/0x83 [ 147.482652] [ 147.483066] VFS: brelse: Trying to free free buffer [ 147.483069] BUG: at fs/buffer.c:1164 __brelse() [ 147.483071] [ 147.483072] Call Trace: [ 147.483081] [88095982] :btrfs:close_ctree+0xbb/0x191 [ 147.483086] [8022e22e] dispose_list+0xad/0xc9 [ 147.483091] [8029fd1a] invalidate_inodes+0xc3/0xd5 [ 147.483099] [8808d170] :btrfs:btrfs_put_super+0x10/0x31 [ 147.483103] [80299849] generic_shutdown_super+0x5b/0xd2 [ 147.483107] [802998e6] kill_block_super+0x26/0x3b [ 147.483111] [80299971] deactivate_super+0x3d/0x55 [ 147.483116] [802a0e4b] sys_umount+0x1ca/0x1f1 [ 147.483121] [8021fd18] sys_newstat+0x19/0x31 [ 147.483127] [80250d5e] system_call+0x7e/0x83 - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS
On Fri, Jun 15, 2007 at 10:46:04PM +0200, Florian D. wrote: Chris Mason wrote: # umount /mnt/temp/ [ 457.980372] [ cut here ] [ 457.980377] kernel BUG at fs/buffer.c:2644! Whoops. Please try this: [ bad patch ] sorry, with the patch applied: [ 147.475077] BUG: at /home/florian/system/btrfs_test/btrfs-0.2/disk-io.c:534 Well, apparently I get get the silly stuff wrong an infinite number of times. Sorry, lets try again: diff -r 38b36731 disk-io.c --- a/disk-io.c Fri Jun 15 13:50:20 2007 -0400 +++ b/disk-io.c Fri Jun 15 16:52:38 2007 -0400 @@ -541,6 +541,8 @@ int write_ctree_super(struct btrfs_trans else ret = submit_bh(WRITE, bh); if (ret == -EOPNOTSUPP) { + get_bh(bh); + lock_buffer(bh); set_buffer_uptodate(bh); root-fs_info-do_barriers = 0; ret = submit_bh(WRITE, bh); - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote: Hi! And before you scream races, take a look. It does not actually add them: Hey, I never screamed that at all, in fact, I completly agree with you :) I agree that the in-kernel implementation could use different abstractions than user-space, provided that the underlying implementation details can be hidden well enough. The key phrase here is if possible, and in fact if possible is much too strong: very many things in software are possible, including user-space drives and a stable kernel module ABI. Some things make sense; others are genuinely bad ideas while still possible. In particular, to layer AppArmor on top of SELinux, the following problems must be addressed: * New files: when a file is created, it is labeled according to the type of the creating process and the type of the parent directory. Applications can also use libselinux to use application logic to relabel the file, but that is not 'mandatory' policy, and fails in cases like cp and mv. AppArmor lets you create a policy that e..g says /home/*/.plan r to permit fingerd to read everyone's .plan file, should it ever exist, and you cannot emulate that with SELinux. A daemon using inotify can instantly[1] detect this and label the file properly if it shows up. Or just create the files with restrictive labels by default. That way you fail closed. From my limited knowledge of SELinux, this is the default today so this would happen by default. Anyone with more SELinux experience want to verify or fix my understanding of this? * Renamed Files: Renaming a file changes the policy with respect to that file in AA. To emulate this in SELinux, you would have to have a way to instantly re-label the file upon rename. Same daemon can do the re-label. ...and no, race there is not important. Attacker may have opened the file under old name and is keeping open file descriptor. So this does not add a new race relative to AA. Agreed. * Renamed Directory trees: The above problem is compounded with directory trees. Changing the name at the top of a large, bushy tree can require instant relabeling of millions of files. Same daemon can do this. And yes, it might take a ammount of time, but the times that this happens in real-life on a production server is quite small, if at all. And now, if you move a tree, there will be old labels for a while. But this does not matter, because attacker could be keeping file descriptors. Agreed. Only case where attacker _can't_ be keeping file descriptors is newly created files in recently moved tree. But as you already create files with restrictive permissions, that's okay. Yes, you may get some -EPERM during the tree move, but AA has that problem already, see that when madly moving trees we sometimes construct path file never ever had. Exactly. I can't think of a real world use of moving directory trees around that this would come up in as a problem. Maybe a source code control system might have this issue for the server, but in a second or two everything would be working again as the new files would be relabled correctly. Can anyone else see a problem with this that I'm just being foolish and missing? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, 2007-06-15 at 14:14 -0700, Greg KH wrote: On Fri, Jun 15, 2007 at 01:43:31PM -0700, Casey Schaufler wrote: Yup, I see that once you accept the notion that it is OK for a file to be misslabeled for a bit and that having a fixxerupperd is sufficient it all falls out. My point is that there is a segment of the security community that had not found this acceptable, even under the conditions outlined. If it meets your needs, I say run with it. If that segment feels that way, then I imagine AA would not meet their requirements today due to file handles and other ways of passing around open files, right? So, would SELinux today (without this AA-like daemon) fit the requirements of this segment? Yes - RHEL 5 is going through CC evaluations for LSPP, CAPP, and RBAC using the features of SELinux where appropriate. Karl - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, 15 Jun 2007, Greg KH wrote: Or just create the files with restrictive labels by default. That way you fail closed. From my limited knowledge of SELinux, this is the default today so this would happen by default. Anyone with more SELinux experience want to verify or fix my understanding of this? This is entirely controllable via policy. That is, you specify that newly create files are labeled to something safe (enforced atomically at the kernel level), and then your userland relabeler would be invoked via inotify to relabel based on your userland pathname specification. This labeling policy can be as granular as you wish, from the entire filesystem to a single file. It can also be applied depending on the process which created the file and the directory its created in, ranging from all processes and all directories, to say, just those running as user_t in directories labeled as public_html_t (or whatever). - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, Jun 15, 2007 at 05:28:35PM -0400, Karl MacMillan wrote: On Fri, 2007-06-15 at 14:14 -0700, Greg KH wrote: On Fri, Jun 15, 2007 at 01:43:31PM -0700, Casey Schaufler wrote: Yup, I see that once you accept the notion that it is OK for a file to be misslabeled for a bit and that having a fixxerupperd is sufficient it all falls out. My point is that there is a segment of the security community that had not found this acceptable, even under the conditions outlined. If it meets your needs, I say run with it. If that segment feels that way, then I imagine AA would not meet their requirements today due to file handles and other ways of passing around open files, right? So, would SELinux today (without this AA-like daemon) fit the requirements of this segment? Yes - RHEL 5 is going through CC evaluations for LSPP, CAPP, and RBAC using the features of SELinux where appropriate. Great, but is there the requirement in the CC stuff such that this type of delayed re-label that an AA-like daemon would need to do cause that model to not be able to be certified like your SELinux implementation is? As I'm guessing the default label for things like this already work properly for SELinux, I figure we should be safe, but I don't know those requirements at all. thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS
Chris Mason wrote: Well, apparently I get get the silly stuff wrong an infinite number of times. Sorry, lets try again: diff -r 38b36731 disk-io.c --- a/disk-io.c Fri Jun 15 13:50:20 2007 -0400 +++ b/disk-io.c Fri Jun 15 16:52:38 2007 -0400 @@ -541,6 +541,8 @@ int write_ctree_super(struct btrfs_trans else ret = submit_bh(WRITE, bh); if (ret == -EOPNOTSUPP) { + get_bh(bh); + lock_buffer(bh); set_buffer_uptodate(bh); root-fs_info-do_barriers = 0; ret = submit_bh(WRITE, bh); ha! it is working now. some numbers from here(with the fio-tool): 1. sequential read 2. random writes 3. sequential read again filesize:300MB, bs:4K btrfs reiserfs ext3 usr% sys% bw sec.usr% sys% bw sec.usr% sys% bw sec. 1 551 68.3 4.6 117 67.4 4.6 524 68.0 4.6 2 010.7 431 221 29.8 10.5318 29.0 10.8 3 012.3 133 119 70.5 4.4 524 68.6 4.5 bw: MB/sec. ext3: -o data=writeback,barrier=1 20GB LVM2 partition on a RAID6 (4 SATA-disks) cheers, florian - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Versioning file system
I hope I got the CC list right. Apologies to anyone in didn't include and anyone I shouldn't have included. The basic idea is to include an idea from VMS that seems to be quite useful: version numbers for files. The idea is that whenever you modify a file the system saves it to na new copy leaving the old file intact. This could be a great advantage from many view points: 1) it would be much easier to do package management as the old version would be automatically saved for a package management system to deal with. 2) backups would also be easier as all versions of a file are automatically saved so it could be potentially very useful for a company or the like. There are probably many others but these were the two that I liked best. Revision numbers could be specified as follows: /path/to/file:revision_number I think that this can be done without breaking userspace if the default was to open the highest revision file if no revision number is specified. The userspace tools would need to be updated to take full advantage of the new system but if the delimiter between the path and revision number were chosen sensibly then the changes to most of userspace would be minimal to non-existant. Personally, I think that the bulk of the implementation could be in the core fs code and the modifications to individual filesystems would be minimal. The main implementation ideas I have (however, I am no kernel expert =) are adding an extra field to struct file and struct inode called int revision (as version is already taken) that would hold the number of the file revision being accessed. Another problem could be the increased usage of disk space. However if only deltas from the first version were stored then this could cut down on space, or if this were too slow to open a file then the deltas could be off every tenth revision (ie 0,10,20,30... where 0,10,20... are full copies of the file). There would need to be a tool of some describtion to remove old revisions but this should not be a major undertaking as it may be something as simple as a new system call. This would have to be careful to update any deltas that were affected by the removal of previous revisions but that could be taken care of in kernel space. Thanks to anyone who stuck with me this far =). I don't know how widely useful this may be but that's the reason I posted before trying to code anything. I would very much value any contributions even a reasoned NAK as I'm still learning how kernel development works (and I would love any implementation directions) Jack - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, 2007-06-15 at 14:44 -0700, Greg KH wrote: On Fri, Jun 15, 2007 at 05:28:35PM -0400, Karl MacMillan wrote: On Fri, 2007-06-15 at 14:14 -0700, Greg KH wrote: On Fri, Jun 15, 2007 at 01:43:31PM -0700, Casey Schaufler wrote: Yup, I see that once you accept the notion that it is OK for a file to be misslabeled for a bit and that having a fixxerupperd is sufficient it all falls out. My point is that there is a segment of the security community that had not found this acceptable, even under the conditions outlined. If it meets your needs, I say run with it. If that segment feels that way, then I imagine AA would not meet their requirements today due to file handles and other ways of passing around open files, right? So, would SELinux today (without this AA-like daemon) fit the requirements of this segment? Yes - RHEL 5 is going through CC evaluations for LSPP, CAPP, and RBAC using the features of SELinux where appropriate. Great, but is there the requirement in the CC stuff such that this type of delayed re-label that an AA-like daemon would need to do cause that model to not be able to be certified like your SELinux implementation is? There are two things: 1) relabeling (non-tranquility) is very problematic in general because revocation is hard (and non-solved in Linux). So you would have to address concerns about that. 2) Whether this would pass certification depends on a lot of factors (like the specific requirements - CC is just a process not a single set of requirements). I don't know enough to really guess. More to the point, though, the requirements in those documents are outdated at best. I don't think it is worth worrying over. As I'm guessing the default label for things like this already work properly for SELinux, I figure we should be safe, but I don't know those requirements at all. Probably not - you would likely want it to be a label that can't be read or written by anything, only relabeled by the daemon. Karl - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Versioning file system
Jack Stone wrote: I hope I got the CC list right. Apologies to anyone in didn't include and anyone I shouldn't have included. The basic idea is to include an idea from VMS that seems to be quite useful: version numbers for files. The idea is that whenever you modify a file the system saves it to na new copy leaving the old file intact. This could be a great advantage from many view points: 1) it would be much easier to do package management as the old version would be automatically saved for a package management system to deal with. 2) backups would also be easier as all versions of a file are automatically saved so it could be potentially very useful for a company or the like. This is one of those things that seems like a good idea, but frequently ends up short. Part of the problem is that whenever you modify a file is ill-defined, or rather, if you were to take the literal meaning of it you'd end up with an unmanageable number of revisions. Furthermore, it turns out that often relationships between files are more important. Thus, in the end it turns out that this stuff is better handled by explicit version-control systems (which require explicit operations to manage revisions) and atomic snapshots (for backup.) -hpa - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Versioning file system
Jack Stone wrote: I hope I got the CC list right. Apologies to anyone in didn't include and anyone I shouldn't have included. The basic idea is to include an idea from VMS that seems to be quite useful: version numbers for files. The idea is that whenever you modify a file the system saves it to na new copy leaving the old file intact. This could be a great advantage from many view points: 1) it would be much easier to do package management as the old version would be automatically saved for a package management system to deal with. 2) backups would also be easier as all versions of a file are automatically saved so it could be potentially very useful for a company or the like. There are probably many others but these were the two that I liked best. Revision numbers could be specified as follows: /path/to/file:revision_number I think that this can be done without breaking userspace if the default was to open the highest revision file if no revision number is specified. The userspace tools would need to be updated to take full advantage of the new system but if the delimiter between the path and revision number were chosen sensibly then the changes to most of userspace would be minimal to non-existant. Personally, I think that the bulk of the implementation could be in the core fs code and the modifications to individual filesystems would be minimal. The main implementation ideas I have (however, I am no kernel expert =) are adding an extra field to struct file and struct inode called int revision (as version is already taken) that would hold the number of the file revision being accessed. Another problem could be the increased usage of disk space. However if only deltas from the first version were stored then this could cut down on space, or if this were too slow to open a file then the deltas could be off every tenth revision (ie 0,10,20,30... where 0,10,20... are full copies of the file). There would need to be a tool of some describtion to remove old revisions but this should not be a major undertaking as it may be something as simple as a new system call. This would have to be careful to update any deltas that were affected by the removal of previous revisions but that could be taken care of in kernel space. Thanks to anyone who stuck with me this far =). I don't know how widely useful this may be but that's the reason I posted before trying to code anything. I would very much value any contributions even a reasoned NAK as I'm still learning how kernel development works (and I would love any implementation directions) Jack - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ The underlying internal implementation of something like this wouldn't be all that hard on many filesystems, but it's the interface that's the problem. The ':' character is a perfectly legal filename character, so doing it that way would break things. I think NetApp more or less got the interface right by putting a .snapshot directory in each directory, with time-versioned subdirectories each containing snapshots of that directory's contents at those points in time. It keeps the backups under the same hierarchy as the original files, to avoid permissions headaches, it's accessible over NFS without modifying the client at all, and it's hidden just enough to make it hard for users to do something stupid. If you want to do something like this (and it's generally not a bad idea), make sure you do it in a way that's not going to change the behavior seen by existing applications, and that is accessible to unmodified remote clients. Hidden .snapshot directories are one way, a parallel /backup filesystem could be another, whatever. If you break existing apps, I won't touch it with a ten foot pole. -- Chris - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Versioning file system
Jack Stone wrote: I hope I got the CC list right. Apologies to anyone in didn't include and anyone I shouldn't have included. The basic idea is to include an idea from VMS that seems to be quite useful: version numbers for files. snip have you looked into ext3cow? it allows you to take snapshots of the entire ext3 fs at a single point, and rollback / extract snapshots at any time later. This may be sufficient for you and the implementation seems to be rather stable already. Cheers, Auke http://www.ext3cow.com/ - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Versioning file system
On Fri, 15 Jun 2007, H. Peter Anvin wrote: alan wrote: ZFS is the cool new thing in that space. Too bad the license makes it hard to incorporate it into the kernel. (I am one of those people that believe that Linux should support EVERY file system, no matter how old or obscure.) I have details on the Luxor UFD-DOS filesystem, if you'd care to implement it. Do you have example discs that can be mounted to test it? If you do, I will consider doing it. I have a couple of older DOS filesystems that got dropped out years ago that I actually need to mount disks that i may rewrite for 2.6.x. Now all i need is the time. And speaking of obscure information... I have a bunch of PCMCIA spec documents from the PCMCIA standards association from the late 90s. Would anyone involved in maintaining the PCMCIA code be interested in it? (Especially if they are in Portland.) It has been a while since I have even needed to look at it and I hate for it to go to waste if it can be of any use. (Bit late now, I know...) -- ANSI C says access to the padding fields of a struct is undefined. ANSI C also says that struct assignment is a memcpy. Therefore struct assignment in ANSI C is a violation of ANSI C... - Alan Cox - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
Greg KH wrote: On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote: * Renamed Directory trees: The above problem is compounded with directory trees. Changing the name at the top of a large, bushy tree can require instant relabeling of millions of files. Same daemon can do this. And yes, it might take a ammount of time, but the times that this happens in real-life on a production server is quite small, if at all. And now, if you move a tree, there will be old labels for a while. But this does not matter, because attacker could be keeping file descriptors. Agreed. We have built a label-based AA prototype. It fails because there is no reasonable way to address the tree renaming problem. Only case where attacker _can't_ be keeping file descriptors is newly created files in recently moved tree. But as you already create files with restrictive permissions, that's okay. Yes, you may get some -EPERM during the tree move, but AA has that problem already, see that when madly moving trees we sometimes construct path file never ever had. Exactly. You are remembering old behavior. The current AppArmor generates only correct and consistent paths. If a process has an open file descriptor to such a file, they will retain access to it, as we described here: http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/techdoc.pdf Under the restorecon-alike proposal, you have a HUGE open race. This post http://bugs.centos.org/view.php?id=1981 describes restorecon running for 30 minutes relabeling a file system. That is so far from acceptable that it is silly. Of course, this depends on the system in question, but restorecon will necessarily need to traverse whatever portions of the filesystem that have changed, which can be quite a long time indeed. Any race condition measured in minutes is a very serious issue. I can't think of a real world use of moving directory trees around that this would come up in as a problem. Consider this case: We've been developing a new web site for a month, and testing it on the server by putting it in a different virtual domain. We want to go live at some particular instant by doing an mv of the content into our public HTML directory. We simultaneously want to take the old web site down and archive it by moving it somewhere else. Under the restorecon proposal, the web site would be horribly broken until restorecon finishes, as various random pages are or are not accessible to Apache. In a smaller scale example, I want to share some files with a friend. I can't be bothered to set up a proper access control system, so I just mv the files to ~crispin/public_html/lookitme and in IRC say get it now, going away in 10 minutes and then move it out again. Yes, you can manually address this by running restorecon ~crispin/public_html. But AA does this automatically without having to run any commands. You could get restorecon to do this automatically by using inotify. But to make it as general and transparent as AA is now, you would have to run inotify on every directory in the system, with consequences for kernel memory and performance. This problem does not exist for SELinux, because SELinux does not expect access to change based on file names. This problem does not exist in the proposed AA implementation, because the patch makes the access decision based on the current name of the file, so it doesn't have a consistency problem between the file and its label; there is no label. The problem is induced by trying to emulate AA on top of SELinux. They don't fit well together. AA fits much better with LSM, which is the reason LSM exists. Maybe a source code control system might have this issue for the server, but in a second or two everything would be working again as the new files would be relabled correctly. Try an hour or two for a large source code repository. Its linear in the number of files, and several hundred thousand files would take a while to relabel. A large GIT tree would be particularly painful because of the very large number of files. Can anyone else see a problem with this that I'm just being foolish and missing? It is not foolish. The label idea is so attractive that last September from discussions with Arjan we actually thought it was the preferred implementation. However, what we've been saying over and over again is that we *tried* this, and it *doesn't* work at the implementation level. There is no good answer, restorecon is an ugly kludge, and so this seductive approach turns out to be a dead end. Caveat: I am *not* saying that labels in general are bad, just that they are a bad way to emulate the AppArmor model. And yes, I am working on a model paper that is more abstract than Andreas' paper http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/techdoc.pdf, but that takes time. Then there's all the other problems, such as file systems that don't support extended
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote: Yes, you may get some -EPERM during the tree move, but AA has that problem already, see that when madly moving trees we sometimes construct path file never ever had. Pavel, please focus on the current AppArmor implementation. You're remembering a flaw with a previous version of AppArmor. The pathnames constructed with the current version of AppArmor are consistent and correct. Thanks. pgps7yFSK4Br7.pgp Description: PGP signature
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
Hi! Yes, you may get some -EPERM during the tree move, but AA has that problem already, see that when madly moving trees we sometimes construct path file never ever had. Pavel, please focus on the current AppArmor implementation. You're remembering a flaw with a previous version of AppArmor. The pathnames constructed with the current version of AppArmor are consistent and correct. Ok, I did not know that this got fixed. How do you do that? Hold a lock preventing renames for a whole time you walk from file to the root of filesystem? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, Jun 15, 2007 at 05:42:08PM -0400, James Morris wrote: On Fri, 15 Jun 2007, Greg KH wrote: Or just create the files with restrictive labels by default. That way you fail closed. From my limited knowledge of SELinux, this is the default today so this would happen by default. Anyone with more SELinux experience want to verify or fix my understanding of this? This is entirely controllable via policy. That is, you specify that newly create files are labeled to something safe (enforced atomically at the kernel level), and then your userland relabeler would be invoked via inotify to relabel based on your userland pathname specification. This labeling policy can be as granular as you wish, from the entire filesystem to a single file. It can also be applied depending on the process which created the file and the directory its created in, ranging from all processes and all directories, to say, just those running as user_t in directories labeled as public_html_t (or whatever). Oh great, then things like source code control systems would have no problems with new files being created under them, or renaming whole trees. So, so much for the it's going to be too slow re-labeling everything issue, as it's not even required for almost all situations :) thanks for letting us know. greg k-h - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, Jun 15, 2007 at 04:30:44PM -0700, Crispin Cowan wrote: Greg KH wrote: On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote: * Renamed Directory trees: The above problem is compounded with directory trees. Changing the name at the top of a large, bushy tree can require instant relabeling of millions of files. Same daemon can do this. And yes, it might take a ammount of time, but the times that this happens in real-life on a production server is quite small, if at all. And now, if you move a tree, there will be old labels for a while. But this does not matter, because attacker could be keeping file descriptors. Agreed. We have built a label-based AA prototype. It fails because there is no reasonable way to address the tree renaming problem. How does inotify not work here? You are notified that the tree is moved, your daemon goes through and relabels things as needed. In the meantime, before the re-label happens, you might have the wrong label on things, but somehow SELinux already handles this, so I think you should be fine. Only case where attacker _can't_ be keeping file descriptors is newly created files in recently moved tree. But as you already create files with restrictive permissions, that's okay. Yes, you may get some -EPERM during the tree move, but AA has that problem already, see that when madly moving trees we sometimes construct path file never ever had. Exactly. You are remembering old behavior. The current AppArmor generates only correct and consistent paths. If a process has an open file descriptor to such a file, they will retain access to it, as we described here: http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/techdoc.pdf Under the restorecon-alike proposal, you have a HUGE open race. This post http://bugs.centos.org/view.php?id=1981 describes restorecon running for 30 minutes relabeling a file system. That is so far from acceptable that it is silly. Ok, so we fix it. Seriously, it shouldn't be that hard. If that's the only problem we have here, it isn't an issue. Of course, this depends on the system in question, but restorecon will necessarily need to traverse whatever portions of the filesystem that have changed, which can be quite a long time indeed. Any race condition measured in minutes is a very serious issue. Agreed, so we fix that. There are ways to speed those kinds of things up quite a bit, and I imagine since the default SELinux behavior doesn't use restorecon in this kind of use-case, no one has spent the time to do the work. I can't think of a real world use of moving directory trees around that this would come up in as a problem. Consider this case: We've been developing a new web site for a month, and testing it on the server by putting it in a different virtual domain. We want to go live at some particular instant by doing an mv of the content into our public HTML directory. We simultaneously want to take the old web site down and archive it by moving it somewhere else. Under the restorecon proposal, the web site would be horribly broken until restorecon finishes, as various random pages are or are not accessible to Apache. Usually you don't do that by doing a 'mv' otherwise you are almost guaranteed stale and mixed up content for some period of time, not to mention the issues surrounding paths that might be messed up. In a smaller scale example, I want to share some files with a friend. I can't be bothered to set up a proper access control system, so I just mv the files to ~crispin/public_html/lookitme and in IRC say get it now, going away in 10 minutes and then move it out again. Yes, you can manually address this by running restorecon ~crispin/public_html. But AA does this automatically without having to run any commands. I'm saying that the daemon will automatically do it for you, you don't have to do anything on your own. You could get restorecon to do this automatically by using inotify. Yes. But to make it as general and transparent as AA is now, you would have to run inotify on every directory in the system, with consequences for kernel memory and performance. What kernel memory and performance issues are there? Your SLED machine already has inotify running on every directory in the system today, and you don't seem to have noticed that :) This problem does not exist for SELinux, because SELinux does not expect access to change based on file names. Agreed. This problem does not exist in the proposed AA implementation, because the patch makes the access decision based on the current name of the file, so it doesn't have a consistency problem between the file and its label; there is no label. No, that's not the issue here. The issue is if we can use the model that AA is exporting to users and apply it to the model that the kernel uses internally to access internal data
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, 15 Jun 2007, Greg KH wrote: On Fri, Jun 15, 2007 at 04:30:44PM -0700, Crispin Cowan wrote: Greg KH wrote: On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote: Only case where attacker _can't_ be keeping file descriptors is newly created files in recently moved tree. But as you already create files with restrictive permissions, that's okay. Yes, you may get some -EPERM during the tree move, but AA has that problem already, see that when madly moving trees we sometimes construct path file never ever had. Exactly. You are remembering old behavior. The current AppArmor generates only correct and consistent paths. If a process has an open file descriptor to such a file, they will retain access to it, as we described here: http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/techdoc.pdf Under the restorecon-alike proposal, you have a HUGE open race. This post http://bugs.centos.org/view.php?id=1981 describes restorecon running for 30 minutes relabeling a file system. That is so far from acceptable that it is silly. Ok, so we fix it. Seriously, it shouldn't be that hard. If that's the only problem we have here, it isn't an issue. how do you 'fix' the laws of physics? the problem is that with a directory that contains lots of files below it you have to access all the files metadata to change the labels on it. it can take significant amounts of time to walk the entire three and change every file. I can't think of a real world use of moving directory trees around that this would come up in as a problem. Consider this case: We've been developing a new web site for a month, and testing it on the server by putting it in a different virtual domain. We want to go live at some particular instant by doing an mv of the content into our public HTML directory. We simultaneously want to take the old web site down and archive it by moving it somewhere else. Under the restorecon proposal, the web site would be horribly broken until restorecon finishes, as various random pages are or are not accessible to Apache. Usually you don't do that by doing a 'mv' otherwise you are almost guaranteed stale and mixed up content for some period of time, not to mention the issues surrounding paths that might be messed up. on the contrary, useing 'mv' is by far the cleanest way to do this. mv htdocs htdocs.old;mv htdocs.new htdocs this makes two atomic changes to the filesystem, but can generate thousands to millions of permission changes as a result. David Lang - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
Hi! Only case where attacker _can't_ be keeping file descriptors is newly created files in recently moved tree. But as you already create files with restrictive permissions, that's okay. Yes, you may get some -EPERM during the tree move, but AA has that problem already, see that when madly moving trees we sometimes construct path file never ever had. Exactly. You are remembering old behavior. The current AppArmor generates only correct and consistent paths. If a process has an open file descriptor to such a file, they will retain access to it, as we described here: Ok, so what I described was actually secure. Good. Under the restorecon-alike proposal, you have a HUGE open race. This post http://bugs.centos.org/view.php?id=1981 describes restorecon running for 30 minutes relabeling a file system. That is so far from acceptable that it is silly. 30 minutes during installation does not seem silly to me. And that race does not make it insecure, because of the open file descriptors. Good. Of course, this depends on the system in question, but restorecon will necessarily need to traverse whatever portions of the filesystem that have changed, which can be quite a long time indeed. Any race condition measured in minutes is a very serious issue. You seem to imply it is security related, it is not. I can have open files for hours or days. I can't think of a real world use of moving directory trees around that this would come up in as a problem. Consider this case: We've been developing a new web site for a month, and testing it on the server by putting it in a different virtual domain. We want to go live at some particular instant by doing an mv of the content into our public HTML directory. We simultaneously want to take the old web site down and archive it by moving it somewhere else. And you do that exactly how, without the race? I do not think ve have three_way_rename(name1, name2, name3) system call. Notice that 1) mv can take minutes already if you move cross filesystem. 2) this is easily avoided by mv-ing somewhere with same permissons, then doing quick moves when daemon is done. You could get restorecon to do this automatically by using inotify. But to make it as general and transparent as AA is now, you would have to run inotify on every directory in the system, with consequences for kernel memory and performance. So you run inotify everywhere. IIRC beagle does it already. Can anyone else see a problem with this that I'm just being foolish and missing? It is not foolish. The label idea is so attractive that last September from discussions with Arjan we actually thought it was the preferred implementation. However, what we've been saying over and over again is that we *tried* this, and it *doesn't* work at the implementation level. There is no good answer, restorecon is an ugly kludge, and so this seductive approach turns out to be a dead end. Talking about dead ends... just put path-based security module into kernel recently got pretty strong NACK from Christoph Hellwig (see TOMOYO Linux thread), and I believe there was similar comment from Al Viro in past. That seems to me as dead-endy as it gets. mv takes 30 minutes is road slightly covered with bushes... compared to that. So we can either forget about AA completely, or take a way Christoph did not NACK. restorecond is such a way, and with inotify it should be acceptable. find does _not_ take that long, not even for git trees. [EMAIL PROTECTED]:/data/l/linux$ time find . /dev/null 0.04user 0.37system 11.50 (0m11.504s) elapsed 3.56%CPU (If you wanted to be super-nice, you could introduce rename() helper into glibc, that would do re-labeling synchronously, and only return when it is done. All the nice applications call glibc anyway, and all the exploits can't take advantage of it, because it is secure already.). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Sat, Jun 16, 2007 at 01:39:14AM +0200, Pavel Machek wrote: Pavel, please focus on the current AppArmor implementation. You're remembering a flaw with a previous version of AppArmor. The pathnames constructed with the current version of AppArmor are consistent and correct. Ok, I did not know that this got fixed. How do you do that? Hold a lock preventing renames for a whole time you walk from file to the root of filesystem? We've improved d_path() to remove many of its previous shortcomings: eb3dfb0cb1f4a44e2d0553f89514ce9f2a9fcaf1 pgpWzjYHnHhk0.pgp Description: PGP signature
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, Jun 15, 2007 at 04:49:25PM -0700, Greg KH wrote: We have built a label-based AA prototype. It fails because there is no reasonable way to address the tree renaming problem. How does inotify not work here? You are notified that the tree is moved, your daemon goes through and relabels things as needed. In the meantime, before the re-label happens, you might have the wrong label on things, but somehow SELinux already handles this, so I think you should be fine. SELinux does not relabel files when containing directories move, so it is not a problem they've chosen to face. How well does inotify handle running attached to every directory on a typical Linux system? Under the restorecon-alike proposal, you have a HUGE open race. This post http://bugs.centos.org/view.php?id=1981 describes restorecon running for 30 minutes relabeling a file system. That is so far from acceptable that it is silly. Ok, so we fix it. Seriously, it shouldn't be that hard. If that's the only problem we have here, it isn't an issue. Restorecon traverses the filesystem from a specific down. In order to apply to an entire system (as would be necessary to try to emulate AppArmor's model using SELinux), restorecon would need to run on vast portions of the filesystem often. (mv ~/public_html ~/archived; or tar zxvf linux-*.tar.gz, etc.) I'm not sure we need to run restorecon every time rename(2) is called. Of course, this depends on the system in question, but restorecon will necessarily need to traverse whatever portions of the filesystem that have changed, which can be quite a long time indeed. Any race condition measured in minutes is a very serious issue. Agreed, so we fix that. There are ways to speed those kinds of things up quite a bit, and I imagine since the default SELinux behavior doesn't use restorecon in this kind of use-case, no one has spent the time to do the work. The time for restorecon is probably best imagined as a kind of 'du' that also updates extended attributes as it does its work. It'd be very difficult to improve on this. What kernel memory and performance issues are there? Your SLED machine already has inotify running on every directory in the system today, and you don't seem to have noticed that :) I beg to differ. :) pgp4PjM5RH2rc.pgp Description: PGP signature
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
Hi! Under the restorecon proposal, the web site would be horribly broken until restorecon finishes, as various random pages are or are not accessible to Apache. Usually you don't do that by doing a 'mv' otherwise you are almost guaranteed stale and mixed up content for some period of time, not to mention the issues surrounding paths that might be messed up. on the contrary, useing 'mv' is by far the cleanest way to do this. mv htdocs htdocs.old;mv htdocs.new htdocs this makes two atomic changes to the filesystem, but can generate thousands to millions of permission changes as a result. Ok, so mv gets slower for big trees... and open() gets faster for deep trees. Previously, open in current directory was one atomic read of directory entry, now it has to read directory, and its parent, and its parent parent, and its... (Or am I wrong and getting full path does not need to bring anything in, not even in cache-cold case?) So, proposed solution has different performance tradeoffs, but should still be a win -- opens are more common than moves. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, Jun 15, 2007 at 05:18:10PM -0700, Seth Arnold wrote: On Fri, Jun 15, 2007 at 04:49:25PM -0700, Greg KH wrote: We have built a label-based AA prototype. It fails because there is no reasonable way to address the tree renaming problem. How does inotify not work here? You are notified that the tree is moved, your daemon goes through and relabels things as needed. In the meantime, before the re-label happens, you might have the wrong label on things, but somehow SELinux already handles this, so I think you should be fine. SELinux does not relabel files when containing directories move, so it is not a problem they've chosen to face. How well does inotify handle running attached to every directory on a typical Linux system? Look at SLED and Beagle (taking the indexing logic out of the equation.) It runs good enough that a major Linux vendor is willing to stake its reputation on it :) Under the restorecon-alike proposal, you have a HUGE open race. This post http://bugs.centos.org/view.php?id=1981 describes restorecon running for 30 minutes relabeling a file system. That is so far from acceptable that it is silly. Ok, so we fix it. Seriously, it shouldn't be that hard. If that's the only problem we have here, it isn't an issue. Restorecon traverses the filesystem from a specific down. In order to apply to an entire system (as would be necessary to try to emulate AppArmor's model using SELinux), restorecon would need to run on vast portions of the filesystem often. (mv ~/public_html ~/archived; or tar zxvf linux-*.tar.gz, etc.) I'm not sure we need to run restorecon every time rename(2) is called. Ok, so we optimize it. Putting speed issues aside right now as a mere implementation details, I'm looking for logical reasons the AA model will not work in this type of system. Of course, this depends on the system in question, but restorecon will necessarily need to traverse whatever portions of the filesystem that have changed, which can be quite a long time indeed. Any race condition measured in minutes is a very serious issue. Agreed, so we fix that. There are ways to speed those kinds of things up quite a bit, and I imagine since the default SELinux behavior doesn't use restorecon in this kind of use-case, no one has spent the time to do the work. The time for restorecon is probably best imagined as a kind of 'du' that also updates extended attributes as it does its work. It'd be very difficult to improve on this. Is that a bet? :) What kernel memory and performance issues are there? Your SLED machine already has inotify running on every directory in the system today, and you don't seem to have noticed that :) I beg to differ. :) The Beagle index backend is known to slow things down at times, yes, but is that the fault of the inotify watches, or the use of mono and a big-ass database on the system at the same time? In the original inotify development, the issue was not inotify at all, unless you have some newer numbers in this regard? And Crispin mentioned that you all already implemented this. Do you have the code around so that we can take a look at it? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, Jun 15, 2007 at 05:01:25PM -0700, [EMAIL PROTECTED] wrote: On Fri, 15 Jun 2007, Greg KH wrote: On Fri, Jun 15, 2007 at 04:30:44PM -0700, Crispin Cowan wrote: Greg KH wrote: On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote: Only case where attacker _can't_ be keeping file descriptors is newly created files in recently moved tree. But as you already create files with restrictive permissions, that's okay. Yes, you may get some -EPERM during the tree move, but AA has that problem already, see that when madly moving trees we sometimes construct path file never ever had. Exactly. You are remembering old behavior. The current AppArmor generates only correct and consistent paths. If a process has an open file descriptor to such a file, they will retain access to it, as we described here: http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/techdoc.pdf Under the restorecon-alike proposal, you have a HUGE open race. This post http://bugs.centos.org/view.php?id=1981 describes restorecon running for 30 minutes relabeling a file system. That is so far from acceptable that it is silly. Ok, so we fix it. Seriously, it shouldn't be that hard. If that's the only problem we have here, it isn't an issue. how do you 'fix' the laws of physics? the problem is that with a directory that contains lots of files below it you have to access all the files metadata to change the labels on it. it can take significant amounts of time to walk the entire three and change every file. Agreed, but you can do this in ways that are faster than others :) I can't think of a real world use of moving directory trees around that this would come up in as a problem. Consider this case: We've been developing a new web site for a month, and testing it on the server by putting it in a different virtual domain. We want to go live at some particular instant by doing an mv of the content into our public HTML directory. We simultaneously want to take the old web site down and archive it by moving it somewhere else. Under the restorecon proposal, the web site would be horribly broken until restorecon finishes, as various random pages are or are not accessible to Apache. Usually you don't do that by doing a 'mv' otherwise you are almost guaranteed stale and mixed up content for some period of time, not to mention the issues surrounding paths that might be messed up. on the contrary, useing 'mv' is by far the cleanest way to do this. mv htdocs htdocs.old;mv htdocs.new htdocs this makes two atomic changes to the filesystem, but can generate thousands to millions of permission changes as a result. I agree, and yet, somehow, SELinux today handles this just fine, right? :) Let's worry about speed issues later on when a working implementation is produced, I'm still looking for the logical reason a system like this can not work properly based on the expected AA interface to users. thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS
On Sat, Jun 16, 2007 at 12:03:06AM +0200, Florian D. wrote: Chris Mason wrote: Well, apparently I get get the silly stuff wrong an infinite number of times. Sorry, lets try again: diff -r 38b36731 disk-io.c --- a/disk-io.c Fri Jun 15 13:50:20 2007 -0400 +++ b/disk-io.c Fri Jun 15 16:52:38 2007 -0400 @@ -541,6 +541,8 @@ int write_ctree_super(struct btrfs_trans else ret = submit_bh(WRITE, bh); if (ret == -EOPNOTSUPP) { + get_bh(bh); + lock_buffer(bh); set_buffer_uptodate(bh); root-fs_info-do_barriers = 0; ret = submit_bh(WRITE, bh); ha! it is working now. some numbers from here(with the fio-tool): Great, I'll have a v0.3 out on Monday with that fix rolled in. 1. sequential read 2. random writes 3. sequential read again filesize:300MB, bs:4K btrfs reiserfs ext3 usr% sys% bw sec.usr% sys% bw sec.usr% sys% bw sec. 1 551 68.3 4.6 117 67.4 4.6 524 68.0 4.6 2 010.7 431 221 29.8 10.5318 29.0 10.8 3 012.3 133 119 70.5 4.4 524 68.6 4.5 bw: MB/sec. ext3: -o data=writeback,barrier=1 20GB LVM2 partition on a RAID6 (4 SATA-disks) Strange, these numbers are not quite what I was expecting ;) Could you please post your fio job files? Also, how much ram does the machine have? Only writing doesn't seem like enough to fill the ram. -chris - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, 15 Jun 2007, Greg KH wrote: Oh great, then things like source code control systems would have no problems with new files being created under them, or renaming whole trees. It depends -- I think we may be talking about different things. If you're using inotify to watch for new files and kick something in userspace to relabel them, it could take a while to relabel a lot of files, although there are likely some gains to be had from parallel relabeling which we've not explored. What I was saying is that you can use traditional SELinux labeling policy underneath that to ensure that there is always a safe label on each file before it is relabeled from userspace. So, so much for the it's going to be too slow re-labeling everything issue, as it's not even required for almost all situations :) You could probably get an idea of the cost by running something like: $ time find /usr/src/linux | xargs setfattr -n user.foo -v bar On my system, it takes about 1.2 seconds to label a fully checked out kernel source tree with ~23,000 files in this manner, on a stock standard ext3 filesystem with a SATA drive. - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, 15 Jun 2007, [EMAIL PROTECTED] wrote: on the contrary, useing 'mv' is by far the cleanest way to do this. mv htdocs htdocs.old;mv htdocs.new htdocs this makes two atomic changes to the filesystem, but can generate thousands to millions of permission changes as a result. OTOH, you've performed your labeling up front, and don't have to effectively relabel each file each time on each access, which is what you're really doing with pathname labeling. - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, 15 Jun 2007, Seth Arnold wrote: How does inotify not work here? You are notified that the tree is moved, your daemon goes through and relabels things as needed. In the meantime, before the re-label happens, you might have the wrong label on things, but somehow SELinux already handles this, so I think you should be fine. SELinux does not relabel files when containing directories move, so it is not a problem they've chosen to face. It's a deliberate design choice, and follows traditional Unix security logic. DAC permissions don't change on every file in the subtree when you mv directories, either. - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, 15 Jun 2007, Seth Arnold wrote: The time for restorecon is probably best imagined as a kind of 'du' that also updates extended attributes as it does its work. It'd be very difficult to improve on this. restorecon can most definitely be improved. - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
--- James Morris [EMAIL PROTECTED] wrote: On my system, it takes about 1.2 seconds to label a fully checked out kernel source tree with ~23,000 files in this manner That's an eternity for that many files to be improperly labeled. If, and the if didn't originate with me, your policy is demonstrably correct (how do you do that?) for all domains you could claim that the action is safe, if not ideal. I can't say if an evaluation team would buy the safe argument. They've been known to balk before. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, 15 Jun 2007, Casey Schaufler wrote: --- James Morris [EMAIL PROTECTED] wrote: On my system, it takes about 1.2 seconds to label a fully checked out kernel source tree with ~23,000 files in this manner That's an eternity for that many files to be improperly labeled. If, and the if didn't originate with me, your policy is demonstrably correct (how do you do that?) for all domains you could claim that the action is safe, if not ideal. I can't say if an evaluation team would buy the safe argument. They've been known to balk before. To clarify: We are discussing a scheme where the underlying SELinux labeling policy always ensures a safe label on a file, and then relabeling newly created files according to their pathnames. There is no expectation that this scheme would be submitted for certification. Its purpose is to merely to provide pathname-based labeling outside of the kernel. - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching
On Fri, Jun 15, 2007 at 09:21:57PM -0400, James Morris wrote: On Fri, 15 Jun 2007, Greg KH wrote: Oh great, then things like source code control systems would have no problems with new files being created under them, or renaming whole trees. It depends -- I think we may be talking about different things. If you're using inotify to watch for new files and kick something in userspace to relabel them, it could take a while to relabel a lot of files, although there are likely some gains to be had from parallel relabeling which we've not explored. What I was saying is that you can use traditional SELinux labeling policy underneath that to ensure that there is always a safe label on each file before it is relabeled from userspace. Ok, yes, I think we are in violent agreement here :) So, so much for the it's going to be too slow re-labeling everything issue, as it's not even required for almost all situations :) You could probably get an idea of the cost by running something like: $ time find /usr/src/linux | xargs setfattr -n user.foo -v bar On my system, it takes about 1.2 seconds to label a fully checked out kernel source tree with ~23,000 files in this manner, on a stock standard ext3 filesystem with a SATA drive. Yeah, that should be very reasonable. I'll wait to see Crispin's code to work off of and see if I can get it to approach that kind of speed. thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html