Re: [Patch 09/18] fs/logfs/gc.c

2007-06-15 Thread Evgeniy Polyakov
On Sun, Jun 03, 2007 at 08:46:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
 --- /dev/null 2007-03-13 19:15:28.862769062 +0100
 +++ linux-2.6.21logfs/fs/logfs/gc.c   2007-06-03 19:18:57.0 +0200

Number of bugs in case of error looks quite sad...

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 07/18] fs/logfs/dir.c

2007-06-15 Thread Evgeniy Polyakov
On Sun, Jun 03, 2007 at 08:44:29PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
 --- /dev/null 2007-03-13 19:15:28.862769062 +0100
 +++ linux-2.6.21logfs/fs/logfs/dir.c  2007-06-03 19:54:55.0 +0200

...

 +static int __logfs_dir_walk(struct inode *dir, struct dentry *dentry,
 + dir_callback handler, struct logfs_disk_dentry *dd, loff_t *pos)
 +{
 + struct qstr *name = dentry ? dentry-d_name : NULL;
 + int ret;
 +
 + for (; ; (*pos)++) {
 + ret = read_dir(dir, dd, *pos);
 + if (ret == -EOF)
 + return 0;
 + if (ret == -ENODATA) {
 + /* deleted dentry */
 + *pos = dir_seek_data(dir, *pos);
 + continue;
 + }
 + if (ret)
 + return ret;
 + BUG_ON(dd-namelen == 0);

This can be moved out of the loop or even to the higher layer where this
one is called.
There is number of such debug stuff in the tree.

...

 +static int logfs_lookup_handler(struct inode *dir, struct dentry *dentry,
 + struct logfs_disk_dentry *dd, loff_t pos)
 +{
 + struct inode *inode;
 +
 + inode = iget(dir-i_sb, be64_to_cpu(dd-ino));
 + if (!inode)
 + return -EIO;
 + return PTR_ERR(d_splice_alias(inode, dentry));
 +}

From perfectionism point of view it should return long not int, but
frankly it is so minor, that even does not costs time I spent writing
this sentence. ^W^W^W

 +static int __logfs_readdir(struct file *file, void *buf, filldir_t filldir)
 +{
 + struct logfs_disk_dentry dd;
 + struct inode *dir = file-f_dentry-d_inode;
 + loff_t pos = file-f_pos - IMPLICIT_NODES;
 + int err;
 +
 + BUG_ON(pos0);

Spaces run away.

 +static void logfs_set_name(struct logfs_disk_dentry *dd, struct qstr *name)
 +{
 + BUG_ON(name-len  LOGFS_MAX_NAMELEN);

Hmmm, I would write here that user is damn wrong and his
DNA is not interested for the humanity gene pool instead of crashing
machine.

 + dd-namelen = cpu_to_be16(name-len);
 + memcpy(dd-name, name-name, name-len);
 +}
 +}

 +static int logfs_symlink(struct inode *dir, struct dentry *dentry,
 + const char *target)
 +{
 + struct inode *inode;
 + size_t destlen = strlen(target) + 1;
 +
 + if (destlen  dir-i_sb-s_blocksize)
 + return -ENAMETOOLONG;

Should it also include related to name overhead, or name is just placed
into datablock as is?

 + inode = logfs_new_inode(dir, S_IFLNK | S_IRWXUGO);
 + if (IS_ERR(inode))
 + return PTR_ERR(inode);
 +
 + inode-i_op = logfs_symlink_iops;
 + inode-i_mapping-a_ops = logfs_reg_aops;
 +
 + return __logfs_create(dir, dentry, inode, target, destlen);
 +}

 +static int logfs_delete_dd(struct inode *dir, struct logfs_disk_dentry *dd,
 + loff_t pos)
 +{
 + int err;
 +
 + err = read_dir(dir, dd, pos);
 +
 + /*
 +  * Getting called with pos somewhere beyond eof is either a goofup
 +  * within this file or means someone maliciously edited the
 +  * (crc-protected) journal.
 +  */
 + LOGFS_BUG_ON(err == -EOF, dir-i_sb);

Maybe just return permanent error, remount itself read-only
and say something insulting instead of killing itself in pain?

 + if (err)
 + return err;
 +
 + dir-i_ctime = dir-i_mtime = CURRENT_TIME;
 + if (dd-type == DT_DIR)
 + dir-i_nlink--;
 + return logfs_delete(dir, pos);
 +}

 +static int logfs_rename_target(struct inode *old_dir, struct dentry 
 *old_dentry,
 + struct inode *new_dir, struct dentry *new_dentry)
 +{
 + struct logfs_super *super = logfs_super(old_dir-i_sb);
 + struct inode *old_inode = old_dentry-d_inode;
 + struct inode *new_inode = new_dentry-d_inode;
 + int isdir = S_ISDIR(old_inode-i_mode);
 + struct logfs_disk_dentry dd;
 + loff_t pos;
 + int err;
 +
 + BUG_ON(isdir != S_ISDIR(new_inode-i_mode));

Spaces run away.

 + if (isdir) {
 + if (!logfs_empty_dir(new_inode))
 + return -ENOTEMPTY;
 + }

One can save two lines of code if put both logical chek in on if ().

 +int logfs_replay_journal(struct super_block *sb)
 +{
 + struct logfs_super *super = logfs_super(sb);
 + struct logfs_disk_dentry dd;
 + struct inode *inode;
 + u64 ino, pos;
 + int err;
 +
 + if (super-s_victim_ino) {
 + /* delete victim inode */
 + ino = super-s_victim_ino;
 + inode = iget(sb, ino);
 + if (!inode)
 + goto fail;
 +
 + super-s_victim_ino = 0;
 + err = logfs_remove_inode(inode);
 + iput(inode);
 + if (err) {
 + super-s_victim_ino = ino;
 + goto fail;
 + }
 + }
 + if (super-s_rename_dir) {
 + /* delete old dd from rename */
 + ino = 

Re: LogFS take four

2007-06-15 Thread Evgeniy Polyakov
On Sun, Jun 03, 2007 at 08:38:46PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
 This round the patch is split into file-sized hunks.  There actually
 seem to be kernel developers not manly enough to digest 6000+ lines of
 code at once.  An I thought I was the only wimp around.
 
 Again, anyone giving comments in the last round is on Cc:.
 
 I'll try to respond to comments but the next round of patches may take a
 while longer, due to other responsibilities.

Hi Jorn.

Sorry for late reply (and wrong non-utf latter in the name :).
I have couple of minor nits I will answer another mails, but in general
I think it should be included in -mm so that people could start using
it report real bugs, but not handwaving about possible problems.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 09/18] fs/logfs/gc.c

2007-06-15 Thread Jörn Engel
On Fri, 15 June 2007 13:03:57 +0400, Evgeniy Polyakov wrote:
 On Sun, Jun 03, 2007 at 08:46:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) 
 wrote:
  --- /dev/null   2007-03-13 19:15:28.862769062 +0100
  +++ linux-2.6.21logfs/fs/logfs/gc.c 2007-06-03 19:18:57.0 +0200
 
 Number of bugs in case of error looks quite sad...

Agreed.  I've started working on error handling.  Most erase errors are
dealt with.  Write errors still need some infrastructure.

If you like I can send another round of patches for review.

Jörn

-- 
Joern's library part 12:
http://physics.nist.gov/cuu/Units/binary.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LogFS take four

2007-06-15 Thread Jörn Engel
On Fri, 15 June 2007 12:37:32 +0400, Evgeniy Polyakov wrote:
 On Sun, Jun 03, 2007 at 08:38:46PM +0200, Jörn Engel ([EMAIL PROTECTED]) 
 wrote:
  This round the patch is split into file-sized hunks.  There actually
  seem to be kernel developers not manly enough to digest 6000+ lines of
  code at once.  An I thought I was the only wimp around.
  
  Again, anyone giving comments in the last round is on Cc:.
  
  I'll try to respond to comments but the next round of patches may take a
  while longer, due to other responsibilities.
 
 Hi Jorn.
 
 Sorry for late reply (and wrong non-utf latter in the name :).

I have been called worse. :)

 I have couple of minor nits I will answer another mails, but in general
 I think it should be included in -mm so that people could start using
 it report real bugs, but not handwaving about possible problems.

Thank you for the confidence.

Jörn

-- 
Mundie uses a textbook tactic of manipulation: start with some
reasonable talk, and lead the audience to an unreasonable conclusion.
-- Bruce Perens
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 09/18] fs/logfs/gc.c

2007-06-15 Thread Evgeniy Polyakov
On Fri, Jun 15, 2007 at 01:14:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) wrote:
 On Fri, 15 June 2007 13:03:57 +0400, Evgeniy Polyakov wrote:
  On Sun, Jun 03, 2007 at 08:46:04PM +0200, Jörn Engel ([EMAIL PROTECTED]) 
  wrote:
   --- /dev/null 2007-03-13 19:15:28.862769062 +0100
   +++ linux-2.6.21logfs/fs/logfs/gc.c   2007-06-03 19:18:57.0 
   +0200
  
  Number of bugs in case of error looks quite sad...
 
 Agreed.  I've started working on error handling.  Most erase errors are
 dealt with.  Write errors still need some infrastructure.
 
 If you like I can send another round of patches for review.

Yep, send them, when thinks they are ready.

 Jörn
 
 -- 
 Joern's library part 12:
 http://physics.nist.gov/cuu/Units/binary.html

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-15 Thread Chuck Lever

Chris Mason wrote:

On Thu, Jun 14, 2007 at 02:20:26PM -0400, Chuck Lever wrote:
NetApp happens to use the standard NDMP protocol for sending the 
flattened file system.  NetApp uses it for synchronous replication, 
volume migration, and back up to nearline storage and tape.  AFS used 
vol dump and vol restore for migration, replication, and back-up. 
ZFS has the zfs send and zfs receive commands that do basically the 
same (Eric Kustarz recently published a blog entry that described how 
these work).  And of course, all file system objects are able to be sent 
this way:  streams, xattrs, ACLs, and so on are all supported.


Note also that NFSv4 supports the idea of migrated or replicated file 
objects.  All that is needed to support it is a mechanism on the servers 
to actually move the data.


Stringing the replication together with the underlying FS would be neat.
Is there a way to deal with a master/slave setup, where the slave may be
out of date?


Among the implementations I'm aware of, there is a varying degree of 
integration into the physical file system.  In general, it depends on 
how far out of date the slave is, and how closely the slave is supposed 
to be synchronized to the master.


A hot backup file system, for example, should be data-consistent within 
a few seconds of the master.  A snapshot is used to initialize a slave, 
followed by a live stream of updates to the master being sent to slaves. 
 Such a mechanism already exists on NetApp filers because they gather 
changes in NVRAM before committing them to the local file system. 
Simply put, these changes can also be bundled and sent to a local hot 
backup filer that is attached via Infiniband, or over the network to a 
remote hot backup filer.


For AFS, replication is done by maintaining a rw and ro copy of a volume 
on the designated master server.  Changes are made to the rw copy over 
time.  When admins want to push out a new version to replicas on another 
server, the ro copy on the master is replaced with a new snapshot, then 
this is pushed to the slaves.  The replicas are always ro and are used 
mostly for load balancing; clients contact the closest or fastest server 
containing a replica of the volume they want to access.  They always 
have a complete copy of the volume (ie no COW on the slaves).


I think you have designed into btrfs a lot of opportunity to implement 
this kind of data virtualization and management... I'm excited to see 
what can be done.
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel/
version:2.1
end:vcard



Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Greg KH
On Sun, Jun 10, 2007 at 10:09:18AM -0700, Crispin Cowan wrote:
 Andreas Gruenbacher wrote:
  On Saturday 09 June 2007 02:17, Greg KH wrote:

  On Sat, Jun 09, 2007 at 12:03:57AM +0200, Andreas Gruenbacher wrote:
  
  AppArmor is meant to be relatively easy to understand, manage, and
   customize, and introducing a labels layer wouldn't help these goals.

  Woah, that describes the userspace side of AA just fine, it means
  nothing when it comes to the in-kernel implementation. There is no 
  reason that you can't implement the same functionality using some
  totally different in-kernel solution if possible.
  
  I agree that the in-kernel implementation could use different abstractions 
  than user-space, provided that the underlying implementation details can be 
  hidden well enough. The key phrase here is if possible, and in fact if 
  possible is much too strong: very many things in software are possible, 
  including user-space drives and a stable kernel module ABI. Some things 
  make 
  sense; others are genuinely bad ideas while still possible.

 In particular, to layer AppArmor on top of SELinux, the following
 problems must be addressed:
 
 * New files: when a file is created, it is labeled according to the
   type of the creating process and the type of the parent directory.
   Applications can also use libselinux to use application logic to
   relabel the file, but that is not 'mandatory' policy, and fails in
   cases like cp and mv. AppArmor lets you create a policy that e..g
   says /home/*/.plan r to permit fingerd to read everyone's .plan
   file, should it ever exist, and you cannot emulate that with SELinux.

A daemon using inotify can instantly[1] detect this and label the file
properly if it shows up.

 * Renamed Files: Renaming a file changes the policy with respect to
   that file in AA. To emulate this in SELinux, you would have to
   have a way to instantly re-label the file upon rename.

Same daemon can do the re-label.

 * Renamed Directory trees: The above problem is compounded with
   directory trees. Changing the name at the top of a large, bushy
   tree can require instant relabeling of millions of files.

Same daemon can do this.  And yes, it might take a ammount of time, but
the times that this happens in real-life on a production server is
quite small, if at all.

 * New Policies: The SEEdit approach of compiling AA profiles into
   SELinux labels involves computing the partition set of files, so
   that each element of the partition set is unique, and corresponds
   to all the policies that treat every file in the element
   identically. If you create a new profile that touches *some* of
   the files in such an element, then you have to split that
   synthetic label, re-compute the partition set, and re-label the
   file system.

Again, same daemon can handle this logic.

 * File Systems That Do Not Support Labels: The most important being
   NFS3 and FAT. Because they do not support labels at all, SELinux
   has to give you an all-or-nothing access control on the entire
   remote volume. AA can give you nuanced access control in these
   file systems.

SELinux already provides support for the whole mounted filesystem,
which, in real-life testing, seems to be quite sufficient.  Also, the
SELinux developers are working on some changes to make this a bit more
fine-grained.

See also Stephan's previous comments about NFSv3 client directories and
multiple views having the potential to cause a lot of havoc.

 You could support all of these features in SELinux, but only by adding
 an in-kernel file matching mechanism similar to AppArmor.

I don't think that is necessary at all, see above for why.

 It would basically load an AppArmor policy into the kernel, label
 files as they are brought from disk into the cache, and then use
 SELinux to do the access controls.

No, do the labeling in userspace with a daemon using inotify to handle
the changing of the files around.

Or has this whole idea of a daemon been disproved already with a
prototype somewhere that failed?  If not, a simple test app would not be
that hard to hack up.  Maybe I'll see if I can do it during the week of
June 24 :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Stephen Smalley
On Fri, 2007-06-15 at 11:01 -0700, Casey Schaufler wrote:
 --- Greg KH [EMAIL PROTECTED] wrote:
 
 
  A daemon using inotify can instantly[1] detect this and label the file
  properly if it shows up.
 
 In our 1995 B1 evaluation of Trusted Irix we were told in no
 uncertain terms that such a solution was not acceptable under
 the TCSEC requirements. Detection and relabel on an unlocked
 object creates an obvious window for exploitation. We were told
 that such a scheme would be considered a design flaw.
 
 I understand that some of the Common Criteria labs are less
 aggressive regarding chasing down these issues than the NCSC
 teams were. It might not prevent an evaluation from completing
 today. It is still hard to explain why it's ok to have a file
 that's labeled incorrectly _even briefly_. It is the systems
 job to ensure that that does not happen.

Um, Casey, he is talking about how to emulate AppArmor behavior on a
label-based system like SELinux, not meeting B1 or LSPP or anything like
that (which AppArmor can't do regardless).  As far as general issue
goes, if your policy is configured such that the new file gets the most
restrictive label possible at creation time and then the daemon relabels
it to a less restrictive label if appropriate, then there is no actual
window of exposure.

Also, there is such a daemon, restorecond, in SELinux (policycoreutils)
although we avoid relying on it for anything security-critical
naturally.  And one could introduce the named type transition concept
that has been discussed in this thread without much difficulty to
selinux.

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Casey Schaufler

--- Greg KH [EMAIL PROTECTED] wrote:


 A daemon using inotify can instantly[1] detect this and label the file
 properly if it shows up.

In our 1995 B1 evaluation of Trusted Irix we were told in no
uncertain terms that such a solution was not acceptable under
the TCSEC requirements. Detection and relabel on an unlocked
object creates an obvious window for exploitation. We were told
that such a scheme would be considered a design flaw.

I understand that some of the Common Criteria labs are less
aggressive regarding chasing down these issues than the NCSC
teams were. It might not prevent an evaluation from completing
today. It is still hard to explain why it's ok to have a file
that's labeled incorrectly _even briefly_. It is the systems
job to ensure that that does not happen.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-15 Thread Florian D.
Chris Mason wrote:
 
 is it possible to test it on top of LVM2 on RAID at this stage?
 
 Yes, I haven't done much multi-spindle testing yet, so I'm definitely
 interested in these numbers.
 
 -chris
 
 

I did not get very far:


# insmod btrfs.ko
# mkfs.btrfs /dev/brain_volume_group/btrfstest
on close 0 blocks are allocated
fs created on /dev/brain_volume_group/btrfstest blocksize 4096 blocks
4980736

(/dev/brain_volume_group/btrfstest is a 20GB logical volume on top of
RAID6)

# mount /dev/brain_volume_group/btrfstest /mnt/temp/
(this gives these kernel-msgs:
[  385.980358] btrfs: dm-6 checksum verify failed on 4
[  385.980462] btrfs: dm-6 checksum verify failed on 12
[  385.980559] btrfs: dm-6 checksum verify failed on 11
)

# touch /mnt/temp/default/testfile.txt
[  445.445638] btrfs: dm-6 checksum verify failed on 10


# umount /mnt/temp/

[  457.980372] [ cut here ]
[  457.980377] kernel BUG at fs/buffer.c:2644!
[  457.980379] invalid opcode:  [1] PREEMPT
[  457.980382] CPU 0
[  457.980384] Modules linked in: btrfs snd_seq_midi cx88_dvb
cx88_vp3054_i2c video_buf_dvb snd_ice1712 snd_ice17xx_ak4xxx
snd_ak4xxx_adda snd_cs8427 snd_ac97_codec ac97_bus snd_i2c
snd_mpu401_uart snd_rawmidi cx8800 cx8802 cx88xx ir_common tveeprom
btcx_risc video_buf uhci_hcd
[  457.980397] Pid: 6040, comm: btrfs/0 Not tainted 2.6.21.5 #50
[  457.980400] RIP: 0010:[8021996c]  [8021996c]
submit_bh+0xf/0x102
[  457.980408] RSP: 0018:81000bab7d30  EFLAGS: 00010246
[  457.980411] RAX: a829 RBX: 81000ac207b0 RCX:
81005f0458c8
[  457.980414] RDX: 0033 RSI: 81000ac207b0 RDI:
0001
[  457.980418] RBP: 0001 R08: 81000ccdd3f8 R09:
81005fe78d50
[  457.980422] R10: 025fffe0 R11: 802407c7 R12:

[  457.980426] R13: 81001c16f480 R14: 81000ccdd3f8 R15:
81000bab7d88
[  457.980430] FS:  2b7554d54050() GS:80728000()
knlGS:f7e3f6b0
[  457.980434] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[  457.980437] CR2: 2abaf000 CR3: 03b06000 CR4:
06e0
[  457.980441] Process btrfs/0 (pid: 6040, threadinfo
81000bab6000, task 81005dba8480)
[  457.980443] Stack:  81000ac207b0 81001c16f480
 880988bb
[  457.980450]  81001c16f480 81000bab7d80 81000fee76e0
88099eb1
[  457.980455]  0001 81001b318c10 81001c16f180
0050
[  457.980459] Call Trace:
[  457.980471]  [880988bb] :btrfs:write_ctree_super+0xd3/0x11f
[  457.980480]  [88099eb1]
:btrfs:btrfs_commit_transaction+0x43e/0x5c0
[  457.980486]  [80257e4b] cache_alloc_refill+0x2a3/0x4f7
[  457.980491]  [802873fb] autoremove_wake_function+0x0/0x2e
[  457.980501]  [8809a033]
:btrfs:btrfs_transaction_cleaner+0x0/0x141
[  457.980510]  [8809a0e0]
:btrfs:btrfs_transaction_cleaner+0xad/0x141
[  457.980515]  [8024869c] run_workqueue+0xb5/0x18e
[  457.980519]  [80245499] worker_thread+0x0/0x145
[  457.980523]  [80287256] keventd_create_kthread+0x0/0x89
[  457.980526]  [802455a8] worker_thread+0x10f/0x145
[  457.980531]  [80277d4f] default_wake_function+0x0/0xe
[  457.980535]  [80287256] keventd_create_kthread+0x0/0x89
[  457.980540]  [802302cb] kthread+0xca/0xfb
[  457.980545]  [80259318] child_rip+0xa/0x12
[  457.980549]  [80287256] keventd_create_kthread+0x0/0x89
[  457.980555]  [80230201] kthread+0x0/0xfb
[  457.980558]  [8025930e] child_rip+0x0/0x12
[  457.980561]
[  457.980562]
[  457.980563] Code: 0f 0b eb fe 8b 06 a8 20 75 04 0f 0b eb fe 48 83
7e 38 00 75
[  457.980571] RIP  [8021996c] submit_bh+0xf/0x102
[  457.980576]  RSP 81000bab7d30


Linux localhost 2.6.21.5 #51 Fri Jun 15 20:53:36 CEST 2007 x86_64 AMD
Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-15 Thread Chris Mason
On Fri, Jun 15, 2007 at 09:08:38PM +0200, Florian D. wrote:
 Chris Mason wrote:
  
  is it possible to test it on top of LVM2 on RAID at this stage?
  
  Yes, I haven't done much multi-spindle testing yet, so I'm definitely
  interested in these numbers.
  
  -chris
  
  
 
 I did not get very far:
 
 
 # insmod btrfs.ko
 # mkfs.btrfs /dev/brain_volume_group/btrfstest
 on close 0 blocks are allocated
 fs created on /dev/brain_volume_group/btrfstest blocksize 4096 blocks
 4980736
 
 (/dev/brain_volume_group/btrfstest is a 20GB logical volume on top of
 RAID6)
 
 # mount /dev/brain_volume_group/btrfstest /mnt/temp/
 (this gives these kernel-msgs:
 [  385.980358] btrfs: dm-6 checksum verify failed on 4
 [  385.980462] btrfs: dm-6 checksum verify failed on 12
 [  385.980559] btrfs: dm-6 checksum verify failed on 11

These are normal on the first mount, the mkfs doesn't set the csums on
the blocks it creates (will fix ;)

 )
 
 # touch /mnt/temp/default/testfile.txt
 [  445.445638] btrfs: dm-6 checksum verify failed on 10
 
 
 # umount /mnt/temp/
 
 [  457.980372] [ cut here ]
 [  457.980377] kernel BUG at fs/buffer.c:2644!

Whoops.  Please try this:

diff -r 38b36731 disk-io.c
--- a/disk-io.c Fri Jun 15 13:50:20 2007 -0400
+++ b/disk-io.c Fri Jun 15 15:12:26 2007 -0400
@@ -541,6 +541,7 @@ int write_ctree_super(struct btrfs_trans
else
ret = submit_bh(WRITE, bh);
if (ret == -EOPNOTSUPP) {
+   lock_buffer(bh);
set_buffer_uptodate(bh);
root-fs_info-do_barriers = 0;
ret = submit_bh(WRITE, bh);
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Pavel Machek
Hi!

And before you scream races, take a look. It does not actually add
them:

   I agree that the in-kernel implementation could use different 
   abstractions 
   than user-space, provided that the underlying implementation details can 
   be 
   hidden well enough. The key phrase here is if possible, and in fact if 
   possible is much too strong: very many things in software are possible, 
   including user-space drives and a stable kernel module ABI. Some things 
   make 
   sense; others are genuinely bad ideas while still possible.
 
  In particular, to layer AppArmor on top of SELinux, the following
  problems must be addressed:
  
  * New files: when a file is created, it is labeled according to the
type of the creating process and the type of the parent directory.
Applications can also use libselinux to use application logic to
relabel the file, but that is not 'mandatory' policy, and fails in
cases like cp and mv. AppArmor lets you create a policy that e..g
says /home/*/.plan r to permit fingerd to read everyone's .plan
file, should it ever exist, and you cannot emulate that with SELinux.
 
 A daemon using inotify can instantly[1] detect this and label the file
 properly if it shows up.

Or just create the files with restrictive labels by default. That way
you fail closed.

  * Renamed Files: Renaming a file changes the policy with respect to
that file in AA. To emulate this in SELinux, you would have to
have a way to instantly re-label the file upon rename.
 
 Same daemon can do the re-label.

...and no, race there is not important. Attacker may have opened the
file under old name and is keeping open file descriptor. So this does
not add a new race relative to AA.

  * Renamed Directory trees: The above problem is compounded with
directory trees. Changing the name at the top of a large, bushy
tree can require instant relabeling of millions of files.
 
 Same daemon can do this.  And yes, it might take a ammount of time, but
 the times that this happens in real-life on a production server is
 quite small, if at all.

And now, if you move a tree, there will be old labels for a while. But
this does not matter, because attacker could be keeping file
descriptors.

Only case where attacker _can't_ be keeping file descriptors is newly
created files in recently moved tree. But as you already create files
with restrictive permissions, that's okay.

Yes, you may get some -EPERM during the tree move, but AA has that
problem already, see that when madly moving trees we sometimes
construct path file never ever had.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-15 Thread Florian D.
Chris Mason wrote:
 # umount /mnt/temp/

 [  457.980372] [ cut here ]
 [  457.980377] kernel BUG at fs/buffer.c:2644!
 
 Whoops.  Please try this:
 
 diff -r 38b36731 disk-io.c
 --- a/disk-io.c   Fri Jun 15 13:50:20 2007 -0400
 +++ b/disk-io.c   Fri Jun 15 15:12:26 2007 -0400
 @@ -541,6 +541,7 @@ int write_ctree_super(struct btrfs_trans
   else
   ret = submit_bh(WRITE, bh);
   if (ret == -EOPNOTSUPP) {
 + lock_buffer(bh);
   set_buffer_uptodate(bh);
   root-fs_info-do_barriers = 0;
   ret = submit_bh(WRITE, bh);
 
sorry, with the patch applied:

[  147.475077] BUG: at
/home/florian/system/btrfs_test/btrfs-0.2/disk-io.c:534
write_ctree_super()
[  147.475082]
[  147.475083] Call Trace:
[  147.475096]  [880957f7] :btrfs:write_ctree_super+0x70/0x140
[  147.475106]  [88096ec5]
:btrfs:btrfs_commit_transaction+0x43e/0x5c0
[  147.475112]  [8022a2a6] __writeback_single_inode+0x34f/0x361
[  147.475121]  [88096fec]
:btrfs:btrfs_commit_transaction+0x565/0x5c0
[  147.475126]  [8027b4eb] autoremove_wake_function+0x0/0x2e
[  147.475136]  [88095915] :btrfs:close_ctree+0x4e/0x191
[  147.475141]  [8022e22e] dispose_list+0xad/0xc9
[  147.475146]  [8029fd1a] invalidate_inodes+0xc3/0xd5
[  147.475155]  [8808d170] :btrfs:btrfs_put_super+0x10/0x31
[  147.475159]  [80299849] generic_shutdown_super+0x5b/0xd2
[  147.475163]  [802998e6] kill_block_super+0x26/0x3b
[  147.475167]  [80299971] deactivate_super+0x3d/0x55
[  147.475172]  [802a0e4b] sys_umount+0x1ca/0x1f1
[  147.475177]  [8021fd18] sys_newstat+0x19/0x31
[  147.475184]  [80250d5e] system_call+0x7e/0x83
[  147.475188]
[  147.476020] BUG: at
/home/florian/system/btrfs_test/btrfs-0.2/disk-io.c:534
write_ctree_super()
[  147.476023]
[  147.476024] Call Trace:
[  147.476033]  [880957f7] :btrfs:write_ctree_super+0x70/0x140
[  147.476042]  [88096ec5]
:btrfs:btrfs_commit_transaction+0x43e/0x5c0
[  147.476048]  [8022a2a6] __writeback_single_inode+0x34f/0x361
[  147.476057]  [88096fec]
:btrfs:btrfs_commit_transaction+0x565/0x5c0
[  147.476061]  [8027b4eb] autoremove_wake_function+0x0/0x2e
[  147.476066]  [802554d9] mutex_lock+0xd/0x1d
[  147.476075]  [8809592d] :btrfs:close_ctree+0x66/0x191
[  147.476080]  [8022e22e] dispose_list+0xad/0xc9
[  147.476085]  [8029fd1a] invalidate_inodes+0xc3/0xd5
[  147.476096]  [8808d170] :btrfs:btrfs_put_super+0x10/0x31
[  147.476100]  [80299849] generic_shutdown_super+0x5b/0xd2
[  147.476104]  [802998e6] kill_block_super+0x26/0x3b
[  147.476108]  [80299971] deactivate_super+0x3d/0x55
[  147.476112]  [802a0e4b] sys_umount+0x1ca/0x1f1
[  147.476118]  [8021fd18] sys_newstat+0x19/0x31
[  147.476124]  [80250d5e] system_call+0x7e/0x83
[  147.476128]
[  147.482579] BUG: at
/home/florian/system/btrfs_test/btrfs-0.2/disk-io.c:534
write_ctree_super()
[  147.482582]
[  147.482583] Call Trace:
[  147.482592]  [880957f7] :btrfs:write_ctree_super+0x70/0x140
[  147.482601]  [88095949] :btrfs:close_ctree+0x82/0x191
[  147.482605]  [8022e22e] dispose_list+0xad/0xc9
[  147.482611]  [8029fd1a] invalidate_inodes+0xc3/0xd5
[  147.482619]  [8808d170] :btrfs:btrfs_put_super+0x10/0x31
[  147.482623]  [80299849] generic_shutdown_super+0x5b/0xd2
[  147.482627]  [802998e6] kill_block_super+0x26/0x3b
[  147.482631]  [80299971] deactivate_super+0x3d/0x55
[  147.482636]  [802a0e4b] sys_umount+0x1ca/0x1f1
[  147.482641]  [8021fd18] sys_newstat+0x19/0x31
[  147.482648]  [80250d5e] system_call+0x7e/0x83
[  147.482652]
[  147.483066] VFS: brelse: Trying to free free buffer
[  147.483069] BUG: at fs/buffer.c:1164 __brelse()
[  147.483071]
[  147.483072] Call Trace:
[  147.483081]  [88095982] :btrfs:close_ctree+0xbb/0x191
[  147.483086]  [8022e22e] dispose_list+0xad/0xc9
[  147.483091]  [8029fd1a] invalidate_inodes+0xc3/0xd5
[  147.483099]  [8808d170] :btrfs:btrfs_put_super+0x10/0x31
[  147.483103]  [80299849] generic_shutdown_super+0x5b/0xd2
[  147.483107]  [802998e6] kill_block_super+0x26/0x3b
[  147.483111]  [80299971] deactivate_super+0x3d/0x55
[  147.483116]  [802a0e4b] sys_umount+0x1ca/0x1f1
[  147.483121]  [8021fd18] sys_newstat+0x19/0x31
[  147.483127]  [80250d5e] system_call+0x7e/0x83
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-15 Thread Chris Mason
On Fri, Jun 15, 2007 at 10:46:04PM +0200, Florian D. wrote:
 Chris Mason wrote:
  # umount /mnt/temp/
 
  [  457.980372] [ cut here ]
  [  457.980377] kernel BUG at fs/buffer.c:2644!
  
  Whoops.  Please try this:

[ bad patch ]

 sorry, with the patch applied:
 
 [  147.475077] BUG: at
 /home/florian/system/btrfs_test/btrfs-0.2/disk-io.c:534

Well, apparently I get get the silly stuff wrong an infinite number of
times.  Sorry, lets try again:

diff -r 38b36731 disk-io.c
--- a/disk-io.c Fri Jun 15 13:50:20 2007 -0400
+++ b/disk-io.c Fri Jun 15 16:52:38 2007 -0400
@@ -541,6 +541,8 @@ int write_ctree_super(struct btrfs_trans
else
ret = submit_bh(WRITE, bh);
if (ret == -EOPNOTSUPP) {
+   get_bh(bh);
+   lock_buffer(bh);
set_buffer_uptodate(bh);
root-fs_info-do_barriers = 0;
ret = submit_bh(WRITE, bh);
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Greg KH
On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote:
 Hi!
 
 And before you scream races, take a look. It does not actually add
 them:

Hey, I never screamed that at all, in fact, I completly agree with you
:)

I agree that the in-kernel implementation could use different 
abstractions 
than user-space, provided that the underlying implementation details 
can be 
hidden well enough. The key phrase here is if possible, and in fact 
if 
possible is much too strong: very many things in software are 
possible, 
including user-space drives and a stable kernel module ABI. Some things 
make 
sense; others are genuinely bad ideas while still possible.
  
   In particular, to layer AppArmor on top of SELinux, the following
   problems must be addressed:
   
   * New files: when a file is created, it is labeled according to the
 type of the creating process and the type of the parent directory.
 Applications can also use libselinux to use application logic to
 relabel the file, but that is not 'mandatory' policy, and fails in
 cases like cp and mv. AppArmor lets you create a policy that e..g
 says /home/*/.plan r to permit fingerd to read everyone's .plan
 file, should it ever exist, and you cannot emulate that with 
   SELinux.
  
  A daemon using inotify can instantly[1] detect this and label the file
  properly if it shows up.
 
 Or just create the files with restrictive labels by default. That way
 you fail closed.

From my limited knowledge of SELinux, this is the default today so this
would happen by default.  Anyone with more SELinux experience want to
verify or fix my understanding of this?

   * Renamed Files: Renaming a file changes the policy with respect to
 that file in AA. To emulate this in SELinux, you would have to
 have a way to instantly re-label the file upon rename.
  
  Same daemon can do the re-label.
 
 ...and no, race there is not important. Attacker may have opened the
 file under old name and is keeping open file descriptor. So this does
 not add a new race relative to AA.

Agreed.

   * Renamed Directory trees: The above problem is compounded with
 directory trees. Changing the name at the top of a large, bushy
 tree can require instant relabeling of millions of files.
  
  Same daemon can do this.  And yes, it might take a ammount of time, but
  the times that this happens in real-life on a production server is
  quite small, if at all.
 
 And now, if you move a tree, there will be old labels for a while. But
 this does not matter, because attacker could be keeping file
 descriptors.

Agreed.

 Only case where attacker _can't_ be keeping file descriptors is newly
 created files in recently moved tree. But as you already create files
 with restrictive permissions, that's okay.
 
 Yes, you may get some -EPERM during the tree move, but AA has that
 problem already, see that when madly moving trees we sometimes
 construct path file never ever had.

Exactly.

I can't think of a real world use of moving directory trees around
that this would come up in as a problem.  Maybe a source code control
system might have this issue for the server, but in a second or two
everything would be working again as the new files would be relabled
correctly.

Can anyone else see a problem with this that I'm just being foolish and
missing?

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Karl MacMillan
On Fri, 2007-06-15 at 14:14 -0700, Greg KH wrote:
 On Fri, Jun 15, 2007 at 01:43:31PM -0700, Casey Schaufler wrote:
  
  Yup, I see that once you accept the notion that it is OK for a
  file to be misslabeled for a bit and that having a fixxerupperd
  is sufficient it all falls out.
  
  My point is that there is a segment of the security community
  that had not found this acceptable, even under the conditions
  outlined. If it meets your needs, I say run with it.
 
 If that segment feels that way, then I imagine AA would not meet their
 requirements today due to file handles and other ways of passing around
 open files, right?
 
 So, would SELinux today (without this AA-like daemon) fit the
 requirements of this segment?
 

Yes - RHEL 5 is going through CC evaluations for LSPP, CAPP, and RBAC
using the features of SELinux where appropriate.

Karl



-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread James Morris
On Fri, 15 Jun 2007, Greg KH wrote:

  Or just create the files with restrictive labels by default. That way
  you fail closed.
 
 From my limited knowledge of SELinux, this is the default today so this
 would happen by default.  Anyone with more SELinux experience want to
 verify or fix my understanding of this?

This is entirely controllable via policy.  That is, you specify that newly 
create files are labeled to something safe (enforced atomically at the 
kernel level), and then your userland relabeler would be invoked via 
inotify to relabel based on your userland pathname specification.

This labeling policy can be as granular as you wish, from the entire 
filesystem to a single file.  It can also be applied depending on the 
process which created the file and the directory its created in, ranging 
from all processes and all directories, to say, just those running as 
user_t in directories labeled as public_html_t (or whatever).



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Greg KH
On Fri, Jun 15, 2007 at 05:28:35PM -0400, Karl MacMillan wrote:
 On Fri, 2007-06-15 at 14:14 -0700, Greg KH wrote:
  On Fri, Jun 15, 2007 at 01:43:31PM -0700, Casey Schaufler wrote:
   
   Yup, I see that once you accept the notion that it is OK for a
   file to be misslabeled for a bit and that having a fixxerupperd
   is sufficient it all falls out.
   
   My point is that there is a segment of the security community
   that had not found this acceptable, even under the conditions
   outlined. If it meets your needs, I say run with it.
  
  If that segment feels that way, then I imagine AA would not meet their
  requirements today due to file handles and other ways of passing around
  open files, right?
  
  So, would SELinux today (without this AA-like daemon) fit the
  requirements of this segment?
  
 
 Yes - RHEL 5 is going through CC evaluations for LSPP, CAPP, and RBAC
 using the features of SELinux where appropriate.

Great, but is there the requirement in the CC stuff such that this type
of delayed re-label that an AA-like daemon would need to do cause that
model to not be able to be certified like your SELinux implementation
is?

As I'm guessing the default label for things like this already work
properly for SELinux, I figure we should be safe, but I don't know those
requirements at all.

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-15 Thread Florian D.
Chris Mason wrote:
 Well, apparently I get get the silly stuff wrong an infinite number of
 times.  Sorry, lets try again:
 
 diff -r 38b36731 disk-io.c
 --- a/disk-io.c   Fri Jun 15 13:50:20 2007 -0400
 +++ b/disk-io.c   Fri Jun 15 16:52:38 2007 -0400
 @@ -541,6 +541,8 @@ int write_ctree_super(struct btrfs_trans
   else
   ret = submit_bh(WRITE, bh);
   if (ret == -EOPNOTSUPP) {
 + get_bh(bh);
 + lock_buffer(bh);
   set_buffer_uptodate(bh);
   root-fs_info-do_barriers = 0;
   ret = submit_bh(WRITE, bh);
 

ha! it is working now. some numbers from here(with the fio-tool):

1. sequential read
2. random writes
3. sequential read again

filesize:300MB, bs:4K

   btrfs  reiserfs   ext3
   usr% sys% bw   sec.usr% sys% bw   sec.usr% sys% bw   sec.
1  551   68.3 4.6 117   67.4 4.6 524   68.0 4.6
2  010.7  431 221   29.8 10.5318   29.0 10.8
3  012.3  133 119   70.5 4.4 524   68.6 4.5

bw: MB/sec.
ext3: -o data=writeback,barrier=1

20GB LVM2 partition on a RAID6 (4 SATA-disks)

cheers,
florian
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Versioning file system

2007-06-15 Thread Jack Stone
I hope I got the CC list right. Apologies to anyone in didn't include
and anyone I shouldn't have included.

The basic idea is to include an idea from VMS that seems to be quite
useful: version numbers for files.

The idea is that whenever you modify a file the system saves it to na
new copy leaving the old file intact. This could be a great advantage
from many view points:
1) it would be much easier to do package management as the old
version would be automatically saved for a package
management system to deal with.

2) backups would also be easier as all versions of a file
are automatically saved so it could be potentially very
useful for a company or the like.

There are probably many others but these were the two that I liked best.

Revision numbers could be specified as follows:
/path/to/file:revision_number


I think that this can be done without breaking userspace if the default
was to open the highest revision file if no revision number is
specified. The userspace tools would need to be updated to take full
advantage of the new system but if the delimiter between the path and
revision number were chosen sensibly then the changes to most of
userspace would be minimal to non-existant.

Personally, I think that the bulk of the implementation could be in the
core fs code and the modifications to individual filesystems would be
minimal. The main implementation ideas I have (however, I am no kernel
expert =) are adding an extra field to struct file and struct inode
called int revision (as version is already taken) that would hold the
number of the file revision being accessed.

Another problem could be the increased usage of disk space. However if
only deltas from the first version were stored then this could cut down
on space, or if this were too slow to open a file then the deltas could
be off every tenth revision (ie 0,10,20,30... where 0,10,20... are full
copies of the file).

There would need to be a tool of some describtion to remove old
revisions but this should not be a major undertaking as it may be
something as simple as a new system call. This would have to be careful
to update any deltas that were affected by the removal of previous
revisions but that could be taken care of in kernel space.

Thanks to anyone who stuck with me this far =). I don't know how widely
useful this may be but that's the reason I posted before trying to code
anything. I would very much value any contributions even a reasoned NAK
as I'm still learning how kernel development works (and I would love any
implementation directions)

Jack
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Karl MacMillan
On Fri, 2007-06-15 at 14:44 -0700, Greg KH wrote:
 On Fri, Jun 15, 2007 at 05:28:35PM -0400, Karl MacMillan wrote:
  On Fri, 2007-06-15 at 14:14 -0700, Greg KH wrote:
   On Fri, Jun 15, 2007 at 01:43:31PM -0700, Casey Schaufler wrote:

Yup, I see that once you accept the notion that it is OK for a
file to be misslabeled for a bit and that having a fixxerupperd
is sufficient it all falls out.

My point is that there is a segment of the security community
that had not found this acceptable, even under the conditions
outlined. If it meets your needs, I say run with it.
   
   If that segment feels that way, then I imagine AA would not meet their
   requirements today due to file handles and other ways of passing around
   open files, right?
   
   So, would SELinux today (without this AA-like daemon) fit the
   requirements of this segment?
   
  
  Yes - RHEL 5 is going through CC evaluations for LSPP, CAPP, and RBAC
  using the features of SELinux where appropriate.
 
 Great, but is there the requirement in the CC stuff such that this type
 of delayed re-label that an AA-like daemon would need to do cause that
 model to not be able to be certified like your SELinux implementation
 is?
 

There are two things:

1) relabeling (non-tranquility) is very problematic in general because
revocation is hard (and non-solved in Linux). So you would have to
address concerns about that.

2) Whether this would pass certification depends on a lot of factors
(like the specific requirements - CC is just a process not a single set
of requirements). I don't know enough to really guess.

More to the point, though, the requirements in those documents are
outdated at best. I don't think it is worth worrying over.

 As I'm guessing the default label for things like this already work
 properly for SELinux, I figure we should be safe, but I don't know those
 requirements at all.
 

Probably not - you would likely want it to be a label that can't be read
or written by anything, only relabeled by the daemon.

Karl


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Versioning file system

2007-06-15 Thread H. Peter Anvin
Jack Stone wrote:
 I hope I got the CC list right. Apologies to anyone in didn't include
 and anyone I shouldn't have included.
 
 The basic idea is to include an idea from VMS that seems to be quite
 useful: version numbers for files.
 
 The idea is that whenever you modify a file the system saves it to na
 new copy leaving the old file intact. This could be a great advantage
 from many view points:
   1) it would be much easier to do package management as the old
   version would be automatically saved for a package
   management system to deal with.
 
   2) backups would also be easier as all versions of a file
   are automatically saved so it could be potentially very
   useful for a company or the like.
 

This is one of those things that seems like a good idea, but frequently
ends up short.  Part of the problem is that whenever you modify a file
is ill-defined, or rather, if you were to take the literal meaning of it
you'd end up with an unmanageable number of revisions.

Furthermore, it turns out that often relationships between files are
more important.

Thus, in the end it turns out that this stuff is better handled by
explicit version-control systems (which require explicit operations to
manage revisions) and atomic snapshots (for backup.)

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Versioning file system

2007-06-15 Thread Chris Snook

Jack Stone wrote:

I hope I got the CC list right. Apologies to anyone in didn't include
and anyone I shouldn't have included.

The basic idea is to include an idea from VMS that seems to be quite
useful: version numbers for files.

The idea is that whenever you modify a file the system saves it to na
new copy leaving the old file intact. This could be a great advantage
from many view points:
1) it would be much easier to do package management as the old
version would be automatically saved for a package
management system to deal with.

2) backups would also be easier as all versions of a file
are automatically saved so it could be potentially very
useful for a company or the like.

There are probably many others but these were the two that I liked best.

Revision numbers could be specified as follows:
/path/to/file:revision_number


I think that this can be done without breaking userspace if the default
was to open the highest revision file if no revision number is
specified. The userspace tools would need to be updated to take full
advantage of the new system but if the delimiter between the path and
revision number were chosen sensibly then the changes to most of
userspace would be minimal to non-existant.

Personally, I think that the bulk of the implementation could be in the
core fs code and the modifications to individual filesystems would be
minimal. The main implementation ideas I have (however, I am no kernel
expert =) are adding an extra field to struct file and struct inode
called int revision (as version is already taken) that would hold the
number of the file revision being accessed.

Another problem could be the increased usage of disk space. However if
only deltas from the first version were stored then this could cut down
on space, or if this were too slow to open a file then the deltas could
be off every tenth revision (ie 0,10,20,30... where 0,10,20... are full
copies of the file).

There would need to be a tool of some describtion to remove old
revisions but this should not be a major undertaking as it may be
something as simple as a new system call. This would have to be careful
to update any deltas that were affected by the removal of previous
revisions but that could be taken care of in kernel space.

Thanks to anyone who stuck with me this far =). I don't know how widely
useful this may be but that's the reason I posted before trying to code
anything. I would very much value any contributions even a reasoned NAK
as I'm still learning how kernel development works (and I would love any
implementation directions)

Jack
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


The underlying internal implementation of something like this wouldn't be all 
that hard on many filesystems, but it's the interface that's the problem.  The 
':' character is a perfectly legal filename character, so doing it that way 
would break things.  I think NetApp more or less got the interface right by 
putting a .snapshot directory in each directory, with time-versioned 
subdirectories each containing snapshots of that directory's contents at those 
points in time.  It keeps the backups under the same hierarchy as the original 
files, to avoid permissions headaches, it's accessible over NFS without 
modifying the client at all, and it's hidden just enough to make it hard for 
users to do something stupid.


If you want to do something like this (and it's generally not a bad idea), make 
sure you do it in a way that's not going to change the behavior seen by existing 
applications, and that is accessible to unmodified remote clients.  Hidden 
.snapshot directories are one way, a parallel /backup filesystem could be 
another, whatever.  If you break existing apps, I won't touch it with a ten foot 
pole.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Versioning file system

2007-06-15 Thread Kok, Auke

Jack Stone wrote:

I hope I got the CC list right. Apologies to anyone in didn't include
and anyone I shouldn't have included.

The basic idea is to include an idea from VMS that seems to be quite
useful: version numbers for files.


snip

have you looked into ext3cow? it allows you to take snapshots of the entire ext3 
fs at a single point, and rollback / extract snapshots at any time later. This 
may be sufficient for you and the implementation seems to be rather stable already.


Cheers,

Auke


http://www.ext3cow.com/
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Versioning file system

2007-06-15 Thread alan

On Fri, 15 Jun 2007, H. Peter Anvin wrote:


alan wrote:


ZFS is the cool new thing in that space.  Too bad the license makes it
hard to incorporate it into the kernel.  (I am one of those people that
believe that Linux should support EVERY file system, no matter how old
or obscure.)



I have details on the Luxor UFD-DOS filesystem, if you'd care to
implement it.


Do you have example discs that can be mounted to test it?  If you do, I 
will consider doing it.


I have a couple of older DOS filesystems that got dropped out years ago 
that I actually need to mount disks that i may rewrite for 2.6.x.


Now all i need is the time.

And speaking of obscure information...

I have a bunch of PCMCIA spec documents from the PCMCIA standards 
association from the late 90s.  Would anyone involved in maintaining the 
PCMCIA code be interested in it?  (Especially if they are in Portland.) 
It has been a while since I have even needed to look at it and I hate for 
it to go to waste if it can be of any use.  (Bit late now, I know...)


--
ANSI C says access to the padding fields of a struct is undefined.
ANSI C also says that struct assignment is a memcpy. Therefore struct
assignment in ANSI C is a violation of ANSI C...
  - Alan Cox
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Crispin Cowan
Greg KH wrote:
 On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote:
   
 * Renamed Directory trees: The above problem is compounded with
   directory trees. Changing the name at the top of a large, bushy
   tree can require instant relabeling of millions of files.
 
 Same daemon can do this.  And yes, it might take a ammount of time, but
 the times that this happens in real-life on a production server is
 quite small, if at all.
   
 And now, if you move a tree, there will be old labels for a while. But
 this does not matter, because attacker could be keeping file
 descriptors.
 
 Agreed.
   
We have built a label-based AA prototype. It fails because there is no
reasonable way to address the tree renaming problem.

 Only case where attacker _can't_ be keeping file descriptors is newly
 created files in recently moved tree. But as you already create files
 with restrictive permissions, that's okay.

 Yes, you may get some -EPERM during the tree move, but AA has that
 problem already, see that when madly moving trees we sometimes
 construct path file never ever had.
 
 Exactly.
   
You are remembering old behavior. The current AppArmor generates only
correct and consistent paths. If a process has an open file descriptor
to such a file, they will retain access to it, as we described here:
http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/techdoc.pdf

Under the restorecon-alike proposal, you have a HUGE open race. This
post http://bugs.centos.org/view.php?id=1981 describes restorecon
running for 30 minutes relabeling a file system. That is so far from
acceptable that it is silly.

Of course, this depends on the system in question, but restorecon will
necessarily need to traverse whatever portions of the filesystem that
have changed, which can be quite a long time indeed. Any race condition
measured in minutes is a very serious issue.

 I can't think of a real world use of moving directory trees around
 that this would come up in as a problem.
Consider this case: We've been developing a new web site for a month,
and testing it on the server by putting it in a different virtual
domain. We want to go live at some particular instant by doing an mv of
the content into our public HTML directory. We simultaneously want to
take the old web site down and archive it by moving it somewhere else.

Under the restorecon proposal, the web site would be horribly broken
until restorecon finishes, as various random pages are or are not
accessible to Apache.

In a smaller scale example, I want to share some files with a friend. I
can't be bothered to set up a proper access control system, so I just mv
the files to ~crispin/public_html/lookitme and in IRC say get it now,
going away in 10 minutes and then move it out again. Yes, you can
manually address this by running restorecon ~crispin/public_html. But
AA does this automatically without having to run any commands.

You could get restorecon to do this automatically by using inotify. But
to make it as general and transparent as AA is now, you would have to
run inotify on every directory in the system, with consequences for
kernel memory and performance.

This problem does not exist for SELinux, because SELinux does not expect
access to change based on file names.

This problem does not exist in the proposed AA implementation, because
the patch makes the access decision based on the current name of the
file, so it doesn't have a consistency problem between the file and its
label; there is no label.

The problem is induced by trying to emulate AA on top of SELinux. They
don't fit well together. AA fits much better with LSM, which is the
reason LSM exists.

   Maybe a source code control
 system might have this issue for the server, but in a second or two
 everything would be working again as the new files would be relabled
 correctly.
   
Try an hour or two for a large source code repository. Its linear in the
number of files, and several hundred thousand files would take a while
to relabel. A large GIT tree would be particularly painful because of
the very large number of files.

 Can anyone else see a problem with this that I'm just being foolish and
 missing?
   
It is not foolish. The label idea is so attractive that last September
from discussions with Arjan we actually thought it was the preferred
implementation. However, what we've been saying over and over again is
that we *tried* this, and it *doesn't* work at the implementation level.
There is no good answer, restorecon is an ugly kludge, and so this
seductive approach turns out to be a dead end.

Caveat: I am *not* saying that labels in general are bad, just that they
are a bad way to emulate the AppArmor model. And yes, I am working on a
model paper that is more abstract than Andreas' paper
http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/techdoc.pdf,
but that takes time.

Then there's all the other problems, such as file systems that don't
support extended 

Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Seth Arnold
On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote:
 Yes, you may get some -EPERM during the tree move, but AA has that
 problem already, see that when madly moving trees we sometimes
 construct path file never ever had.

Pavel, please focus on the current AppArmor implementation. You're
remembering a flaw with a previous version of AppArmor. The pathnames
constructed with the current version of AppArmor are consistent and
correct.

Thanks.


pgps7yFSK4Br7.pgp
Description: PGP signature


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Pavel Machek
Hi!

  Yes, you may get some -EPERM during the tree move, but AA has that
  problem already, see that when madly moving trees we sometimes
  construct path file never ever had.
 
 Pavel, please focus on the current AppArmor implementation. You're
 remembering a flaw with a previous version of AppArmor. The pathnames
 constructed with the current version of AppArmor are consistent and
 correct.

Ok, I did not know that this got fixed.

How do you do that? Hold a lock preventing renames for a whole time
you walk from file to the root of filesystem?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Greg KH
On Fri, Jun 15, 2007 at 05:42:08PM -0400, James Morris wrote:
 On Fri, 15 Jun 2007, Greg KH wrote:
 
   Or just create the files with restrictive labels by default. That way
   you fail closed.
  
  From my limited knowledge of SELinux, this is the default today so this
  would happen by default.  Anyone with more SELinux experience want to
  verify or fix my understanding of this?
 
 This is entirely controllable via policy.  That is, you specify that newly 
 create files are labeled to something safe (enforced atomically at the 
 kernel level), and then your userland relabeler would be invoked via 
 inotify to relabel based on your userland pathname specification.
 
 This labeling policy can be as granular as you wish, from the entire 
 filesystem to a single file.  It can also be applied depending on the 
 process which created the file and the directory its created in, ranging 
 from all processes and all directories, to say, just those running as 
 user_t in directories labeled as public_html_t (or whatever).

Oh great, then things like source code control systems would have no
problems with new files being created under them, or renaming whole
trees.

So, so much for the it's going to be too slow re-labeling everything
issue, as it's not even required for almost all situations :)

thanks for letting us know.

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Greg KH
On Fri, Jun 15, 2007 at 04:30:44PM -0700, Crispin Cowan wrote:
 Greg KH wrote:
  On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote:

  * Renamed Directory trees: The above problem is compounded with
directory trees. Changing the name at the top of a large, bushy
tree can require instant relabeling of millions of files.
  
  Same daemon can do this.  And yes, it might take a ammount of time, but
  the times that this happens in real-life on a production server is
  quite small, if at all.

  And now, if you move a tree, there will be old labels for a while. But
  this does not matter, because attacker could be keeping file
  descriptors.
  
  Agreed.

 We have built a label-based AA prototype. It fails because there is no
 reasonable way to address the tree renaming problem.

How does inotify not work here?  You are notified that the tree is
moved, your daemon goes through and relabels things as needed.  In the
meantime, before the re-label happens, you might have the wrong label on
things, but somehow SELinux already handles this, so I think you
should be fine.

  Only case where attacker _can't_ be keeping file descriptors is newly
  created files in recently moved tree. But as you already create files
  with restrictive permissions, that's okay.
 
  Yes, you may get some -EPERM during the tree move, but AA has that
  problem already, see that when madly moving trees we sometimes
  construct path file never ever had.
  
  Exactly.

 You are remembering old behavior. The current AppArmor generates only
 correct and consistent paths. If a process has an open file descriptor
 to such a file, they will retain access to it, as we described here:
 http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/techdoc.pdf
 
 Under the restorecon-alike proposal, you have a HUGE open race. This
 post http://bugs.centos.org/view.php?id=1981 describes restorecon
 running for 30 minutes relabeling a file system. That is so far from
 acceptable that it is silly.

Ok, so we fix it.  Seriously, it shouldn't be that hard.  If that's the
only problem we have here, it isn't an issue.

 Of course, this depends on the system in question, but restorecon will
 necessarily need to traverse whatever portions of the filesystem that
 have changed, which can be quite a long time indeed. Any race condition
 measured in minutes is a very serious issue.

Agreed, so we fix that.  There are ways to speed those kinds of things
up quite a bit, and I imagine since the default SELinux behavior doesn't
use restorecon in this kind of use-case, no one has spent the time to do
the work.

  I can't think of a real world use of moving directory trees around
  that this would come up in as a problem.
 Consider this case: We've been developing a new web site for a month,
 and testing it on the server by putting it in a different virtual
 domain. We want to go live at some particular instant by doing an mv of
 the content into our public HTML directory. We simultaneously want to
 take the old web site down and archive it by moving it somewhere else.
 
 Under the restorecon proposal, the web site would be horribly broken
 until restorecon finishes, as various random pages are or are not
 accessible to Apache.

Usually you don't do that by doing a 'mv' otherwise you are almost
guaranteed stale and mixed up content for some period of time, not to
mention the issues surrounding paths that might be messed up.

 In a smaller scale example, I want to share some files with a friend. I
 can't be bothered to set up a proper access control system, so I just mv
 the files to ~crispin/public_html/lookitme and in IRC say get it now,
 going away in 10 minutes and then move it out again. Yes, you can
 manually address this by running restorecon ~crispin/public_html. But
 AA does this automatically without having to run any commands.

I'm saying that the daemon will automatically do it for you, you don't
have to do anything on your own.

 You could get restorecon to do this automatically by using inotify.

Yes.

 But to make it as general and transparent as AA is now, you would have
 to run inotify on every directory in the system, with consequences for
 kernel memory and performance.

What kernel memory and performance issues are there?  Your SLED
machine already has inotify running on every directory in the system
today, and you don't seem to have noticed that :)

 This problem does not exist for SELinux, because SELinux does not expect
 access to change based on file names.

Agreed.

 This problem does not exist in the proposed AA implementation, because
 the patch makes the access decision based on the current name of the
 file, so it doesn't have a consistency problem between the file and its
 label; there is no label.

No, that's not the issue here.  The issue is if we can use the model
that AA is exporting to users and apply it to the model that the kernel
uses internally to access internal data 

Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread david

On Fri, 15 Jun 2007, Greg KH wrote:


On Fri, Jun 15, 2007 at 04:30:44PM -0700, Crispin Cowan wrote:

Greg KH wrote:

On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote:

Only case where attacker _can't_ be keeping file descriptors is newly
created files in recently moved tree. But as you already create files
with restrictive permissions, that's okay.

Yes, you may get some -EPERM during the tree move, but AA has that
problem already, see that when madly moving trees we sometimes
construct path file never ever had.


Exactly.


You are remembering old behavior. The current AppArmor generates only
correct and consistent paths. If a process has an open file descriptor
to such a file, they will retain access to it, as we described here:
http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/techdoc.pdf

Under the restorecon-alike proposal, you have a HUGE open race. This
post http://bugs.centos.org/view.php?id=1981 describes restorecon
running for 30 minutes relabeling a file system. That is so far from
acceptable that it is silly.


Ok, so we fix it.  Seriously, it shouldn't be that hard.  If that's the
only problem we have here, it isn't an issue.


how do you 'fix' the laws of physics?

the problem is that with a directory that contains lots of files below it 
you have to access all the files metadata to change the labels on it. it 
can take significant amounts of time to walk the entire three and change 
every file.



I can't think of a real world use of moving directory trees around
that this would come up in as a problem.

Consider this case: We've been developing a new web site for a month,
and testing it on the server by putting it in a different virtual
domain. We want to go live at some particular instant by doing an mv of
the content into our public HTML directory. We simultaneously want to
take the old web site down and archive it by moving it somewhere else.

Under the restorecon proposal, the web site would be horribly broken
until restorecon finishes, as various random pages are or are not
accessible to Apache.


Usually you don't do that by doing a 'mv' otherwise you are almost
guaranteed stale and mixed up content for some period of time, not to
mention the issues surrounding paths that might be messed up.


on the contrary, useing 'mv' is by far the cleanest way to do this.

mv htdocs htdocs.old;mv htdocs.new htdocs

this makes two atomic changes to the filesystem, but can generate 
thousands to millions of permission changes as a result.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Pavel Machek
Hi!

  Only case where attacker _can't_ be keeping file descriptors is newly
  created files in recently moved tree. But as you already create files
  with restrictive permissions, that's okay.
 
  Yes, you may get some -EPERM during the tree move, but AA has that
  problem already, see that when madly moving trees we sometimes
  construct path file never ever had.
  
  Exactly.

 You are remembering old behavior. The current AppArmor generates only
 correct and consistent paths. If a process has an open file descriptor
 to such a file, they will retain access to it, as we described here:

Ok, so what I described was actually secure. Good.

 Under the restorecon-alike proposal, you have a HUGE open race. This
 post http://bugs.centos.org/view.php?id=1981 describes restorecon
 running for 30 minutes relabeling a file system. That is so far from
 acceptable that it is silly.

30 minutes during installation does not seem silly to me.

And that race does not make it insecure, because of the open file
descriptors. Good.

 Of course, this depends on the system in question, but restorecon will
 necessarily need to traverse whatever portions of the filesystem that
 have changed, which can be quite a long time indeed. Any race condition
 measured in minutes is a very serious issue.

You seem to imply it is security related, it is not. I can have open
files for hours or days.

  I can't think of a real world use of moving directory trees around
  that this would come up in as a problem.
 Consider this case: We've been developing a new web site for a month,
 and testing it on the server by putting it in a different virtual
 domain. We want to go live at some particular instant by doing an mv of
 the content into our public HTML directory. We simultaneously want to
 take the old web site down and archive it by moving it somewhere
 else.

And you do that exactly how, without the race? I do not think ve have
three_way_rename(name1, name2, name3) system call.

Notice that

1) mv can take minutes already if you move cross filesystem.

2) this is easily avoided by mv-ing somewhere with same permissons,
then doing quick moves when daemon is done.

 You could get restorecon to do this automatically by using inotify. But
 to make it as general and transparent as AA is now, you would have to
 run inotify on every directory in the system, with consequences for
 kernel memory and performance.

So you run inotify everywhere. IIRC beagle does it already.

  Can anyone else see a problem with this that I'm just being foolish and
  missing?

 It is not foolish. The label idea is so attractive that last September
 from discussions with Arjan we actually thought it was the preferred
 implementation. However, what we've been saying over and over again is
 that we *tried* this, and it *doesn't* work at the implementation level.
 There is no good answer, restorecon is an ugly kludge, and so this
 seductive approach turns out to be a dead end.

Talking about dead ends... just put path-based security module into
kernel recently got pretty strong NACK from Christoph Hellwig (see
TOMOYO Linux thread), and I believe there was similar comment from Al
Viro in past. That seems to me as dead-endy as it gets. mv takes 30
minutes is road slightly covered with bushes... compared to that.

So we can either forget about AA completely, or take a way Christoph
did not NACK. restorecond is such a way, and with inotify it should
be acceptable. find does _not_ take that long, not even for git trees.

[EMAIL PROTECTED]:/data/l/linux$ time find .  /dev/null
0.04user 0.37system 11.50 (0m11.504s) elapsed 3.56%CPU

(If you wanted to be super-nice, you could introduce rename() helper
into glibc, that would do re-labeling synchronously, and only return
when it is done. All the nice applications call glibc anyway, and all
the exploits can't take advantage of it, because it is secure
already.).

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Seth Arnold
On Sat, Jun 16, 2007 at 01:39:14AM +0200, Pavel Machek wrote:
  Pavel, please focus on the current AppArmor implementation. You're
  remembering a flaw with a previous version of AppArmor. The pathnames
  constructed with the current version of AppArmor are consistent and
  correct.
 
 Ok, I did not know that this got fixed.
 
 How do you do that? Hold a lock preventing renames for a whole time
 you walk from file to the root of filesystem?

We've improved d_path() to remove many of its previous shortcomings:

eb3dfb0cb1f4a44e2d0553f89514ce9f2a9fcaf1


pgpWzjYHnHhk0.pgp
Description: PGP signature


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Seth Arnold
On Fri, Jun 15, 2007 at 04:49:25PM -0700, Greg KH wrote:
  We have built a label-based AA prototype. It fails because there is no
  reasonable way to address the tree renaming problem.
 
 How does inotify not work here?  You are notified that the tree is
 moved, your daemon goes through and relabels things as needed.  In the
 meantime, before the re-label happens, you might have the wrong label on
 things, but somehow SELinux already handles this, so I think you
 should be fine.

SELinux does not relabel files when containing directories move, so it
is not a problem they've chosen to face.

How well does inotify handle running attached to every directory on a
typical Linux system?

  Under the restorecon-alike proposal, you have a HUGE open race. This
  post http://bugs.centos.org/view.php?id=1981 describes restorecon
  running for 30 minutes relabeling a file system. That is so far from
  acceptable that it is silly.
 
 Ok, so we fix it.  Seriously, it shouldn't be that hard.  If that's the
 only problem we have here, it isn't an issue.

Restorecon traverses the filesystem from a specific down. In order to
apply to an entire system (as would be necessary to try to emulate
AppArmor's model using SELinux), restorecon would need to run on vast
portions of the filesystem often. (mv ~/public_html ~/archived; or tar
zxvf linux-*.tar.gz, etc.)

I'm not sure we need to run restorecon every time rename(2) is called.

  Of course, this depends on the system in question, but restorecon will
  necessarily need to traverse whatever portions of the filesystem that
  have changed, which can be quite a long time indeed. Any race condition
  measured in minutes is a very serious issue.
 
 Agreed, so we fix that.  There are ways to speed those kinds of things
 up quite a bit, and I imagine since the default SELinux behavior doesn't
 use restorecon in this kind of use-case, no one has spent the time to do
 the work.

The time for restorecon is probably best imagined as a kind of 'du' that
also updates extended attributes as it does its work. It'd be very
difficult to improve on this.

 What kernel memory and performance issues are there?  Your SLED
 machine already has inotify running on every directory in the system
 today, and you don't seem to have noticed that :)

I beg to differ. :)


pgp4PjM5RH2rc.pgp
Description: PGP signature


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Pavel Machek
Hi!

 Under the restorecon proposal, the web site would be horribly broken
 until restorecon finishes, as various random pages are or are not
 accessible to Apache.
 
 Usually you don't do that by doing a 'mv' otherwise you are almost
 guaranteed stale and mixed up content for some period of time, not to
 mention the issues surrounding paths that might be messed up.
 
 on the contrary, useing 'mv' is by far the cleanest way to do this.
 
 mv htdocs htdocs.old;mv htdocs.new htdocs
 
 this makes two atomic changes to the filesystem, but can generate 
 thousands to millions of permission changes as a result.

Ok, so mv gets slower for big trees... and open() gets faster for deep
trees. Previously, open in current directory was one atomic read of
directory entry, now it has to read directory, and its parent, and its
parent parent, and its...

(Or am I wrong and getting full path does not need to bring anything
in, not even in cache-cold case?)

So, proposed solution has different performance tradeoffs, but should
still be a win -- opens are more common than moves.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Greg KH
On Fri, Jun 15, 2007 at 05:18:10PM -0700, Seth Arnold wrote:
 On Fri, Jun 15, 2007 at 04:49:25PM -0700, Greg KH wrote:
   We have built a label-based AA prototype. It fails because there is no
   reasonable way to address the tree renaming problem.
  
  How does inotify not work here?  You are notified that the tree is
  moved, your daemon goes through and relabels things as needed.  In the
  meantime, before the re-label happens, you might have the wrong label on
  things, but somehow SELinux already handles this, so I think you
  should be fine.
 
 SELinux does not relabel files when containing directories move, so it
 is not a problem they've chosen to face.
 
 How well does inotify handle running attached to every directory on a
 typical Linux system?

Look at SLED and Beagle (taking the indexing logic out of the equation.)
It runs good enough that a major Linux vendor is willing to stake its
reputation on it :)

   Under the restorecon-alike proposal, you have a HUGE open race. This
   post http://bugs.centos.org/view.php?id=1981 describes restorecon
   running for 30 minutes relabeling a file system. That is so far from
   acceptable that it is silly.
  
  Ok, so we fix it.  Seriously, it shouldn't be that hard.  If that's the
  only problem we have here, it isn't an issue.
 
 Restorecon traverses the filesystem from a specific down. In order to
 apply to an entire system (as would be necessary to try to emulate
 AppArmor's model using SELinux), restorecon would need to run on vast
 portions of the filesystem often. (mv ~/public_html ~/archived; or tar
 zxvf linux-*.tar.gz, etc.)
 
 I'm not sure we need to run restorecon every time rename(2) is called.

Ok, so we optimize it.  Putting speed issues aside right now as a mere
implementation details, I'm looking for logical reasons the AA model
will not work in this type of system.

   Of course, this depends on the system in question, but restorecon will
   necessarily need to traverse whatever portions of the filesystem that
   have changed, which can be quite a long time indeed. Any race condition
   measured in minutes is a very serious issue.
  
  Agreed, so we fix that.  There are ways to speed those kinds of things
  up quite a bit, and I imagine since the default SELinux behavior doesn't
  use restorecon in this kind of use-case, no one has spent the time to do
  the work.
 
 The time for restorecon is probably best imagined as a kind of 'du' that
 also updates extended attributes as it does its work. It'd be very
 difficult to improve on this.

Is that a bet?  :)

  What kernel memory and performance issues are there?  Your SLED
  machine already has inotify running on every directory in the system
  today, and you don't seem to have noticed that :)
 
 I beg to differ. :)

The Beagle index backend is known to slow things down at times, yes, but
is that the fault of the inotify watches, or the use of mono and a
big-ass database on the system at the same time?

In the original inotify development, the issue was not inotify at all,
unless you have some newer numbers in this regard?

And Crispin mentioned that you all already implemented this.  Do you
have the code around so that we can take a look at it?

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Greg KH
On Fri, Jun 15, 2007 at 05:01:25PM -0700, [EMAIL PROTECTED] wrote:
  On Fri, 15 Jun 2007, Greg KH wrote:
 
  On Fri, Jun 15, 2007 at 04:30:44PM -0700, Crispin Cowan wrote:
  Greg KH wrote:
  On Fri, Jun 15, 2007 at 10:06:23PM +0200, Pavel Machek wrote:
  Only case where attacker _can't_ be keeping file descriptors is newly
  created files in recently moved tree. But as you already create files
  with restrictive permissions, that's okay.
 
  Yes, you may get some -EPERM during the tree move, but AA has that
  problem already, see that when madly moving trees we sometimes
  construct path file never ever had.
 
  Exactly.
 
  You are remembering old behavior. The current AppArmor generates only
  correct and consistent paths. If a process has an open file descriptor
  to such a file, they will retain access to it, as we described here:
  http://forgeftp.novell.com//apparmor/LKML_Submission-May_07/techdoc.pdf
 
  Under the restorecon-alike proposal, you have a HUGE open race. This
  post http://bugs.centos.org/view.php?id=1981 describes restorecon
  running for 30 minutes relabeling a file system. That is so far from
  acceptable that it is silly.
 
  Ok, so we fix it.  Seriously, it shouldn't be that hard.  If that's the
  only problem we have here, it isn't an issue.
 
  how do you 'fix' the laws of physics?
 
  the problem is that with a directory that contains lots of files below it 
  you have to access all the files metadata to change the labels on it. it can 
  take significant amounts of time to walk the entire three and change every 
  file.

Agreed, but you can do this in ways that are faster than others :)

  I can't think of a real world use of moving directory trees around
  that this would come up in as a problem.
  Consider this case: We've been developing a new web site for a month,
  and testing it on the server by putting it in a different virtual
  domain. We want to go live at some particular instant by doing an mv of
  the content into our public HTML directory. We simultaneously want to
  take the old web site down and archive it by moving it somewhere else.
 
  Under the restorecon proposal, the web site would be horribly broken
  until restorecon finishes, as various random pages are or are not
  accessible to Apache.
 
  Usually you don't do that by doing a 'mv' otherwise you are almost
  guaranteed stale and mixed up content for some period of time, not to
  mention the issues surrounding paths that might be messed up.
 
  on the contrary, useing 'mv' is by far the cleanest way to do this.
 
  mv htdocs htdocs.old;mv htdocs.new htdocs
 
  this makes two atomic changes to the filesystem, but can generate thousands 
  to millions of permission changes as a result.

I agree, and yet, somehow, SELinux today handles this just fine, right?
:)

Let's worry about speed issues later on when a working implementation is
produced, I'm still looking for the logical reason a system like this
can not work properly based on the expected AA interface to users.

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-15 Thread Chris Mason
On Sat, Jun 16, 2007 at 12:03:06AM +0200, Florian D. wrote:
 Chris Mason wrote:
  Well, apparently I get get the silly stuff wrong an infinite number of
  times.  Sorry, lets try again:
  
  diff -r 38b36731 disk-io.c
  --- a/disk-io.c Fri Jun 15 13:50:20 2007 -0400
  +++ b/disk-io.c Fri Jun 15 16:52:38 2007 -0400
  @@ -541,6 +541,8 @@ int write_ctree_super(struct btrfs_trans
  else
  ret = submit_bh(WRITE, bh);
  if (ret == -EOPNOTSUPP) {
  +   get_bh(bh);
  +   lock_buffer(bh);
  set_buffer_uptodate(bh);
  root-fs_info-do_barriers = 0;
  ret = submit_bh(WRITE, bh);
  
 
 ha! it is working now. some numbers from here(with the fio-tool):

Great, I'll have a v0.3 out on Monday with that fix rolled in.

 
 1. sequential read
 2. random writes
 3. sequential read again
 
 filesize:300MB, bs:4K
 
btrfs  reiserfs   ext3
usr% sys% bw   sec.usr% sys% bw   sec.usr% sys% bw   sec.
 1  551   68.3 4.6 117   67.4 4.6 524   68.0 4.6
 2  010.7  431 221   29.8 10.5318   29.0 10.8
 3  012.3  133 119   70.5 4.4 524   68.6 4.5
 
 bw: MB/sec.
 ext3: -o data=writeback,barrier=1
 
 20GB LVM2 partition on a RAID6 (4 SATA-disks)

Strange, these numbers are not quite what I was expecting ;)  Could you
please post your fio job files?  Also, how much ram does the machine
have?  Only writing doesn't seem like enough to fill the ram.

-chris

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread James Morris
On Fri, 15 Jun 2007, Greg KH wrote:

 Oh great, then things like source code control systems would have no
 problems with new files being created under them, or renaming whole
 trees.

It depends -- I think we may be talking about different things.

If you're using inotify to watch for new files and kick something in 
userspace to relabel them, it could take a while to relabel a lot of 
files, although there are likely some gains to be had from parallel 
relabeling which we've not explored.

What I was saying is that you can use traditional SELinux labeling policy 
underneath that to ensure that there is always a safe label on each file 
before it is relabeled from userspace.

 So, so much for the it's going to be too slow re-labeling everything
 issue, as it's not even required for almost all situations :)

You could probably get an idea of the cost by running something like:

$ time find /usr/src/linux | xargs setfattr -n user.foo -v bar

On my system, it takes about 1.2 seconds to label a fully checked out 
kernel source tree with ~23,000 files in this manner, on a stock standard 
ext3 filesystem with a SATA drive.



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread James Morris
On Fri, 15 Jun 2007, [EMAIL PROTECTED] wrote:

 on the contrary, useing 'mv' is by far the cleanest way to do this.
 
 mv htdocs htdocs.old;mv htdocs.new htdocs
 
 this makes two atomic changes to the filesystem, but can generate thousands to
 millions of permission changes as a result.

OTOH, you've performed your labeling up front, and don't have to 
effectively relabel each file each time on each access, which is what 
you're really doing with pathname labeling.



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread James Morris
On Fri, 15 Jun 2007, Seth Arnold wrote:

  How does inotify not work here?  You are notified that the tree is
  moved, your daemon goes through and relabels things as needed.  In the
  meantime, before the re-label happens, you might have the wrong label on
  things, but somehow SELinux already handles this, so I think you
  should be fine.
 
 SELinux does not relabel files when containing directories move, so it
 is not a problem they've chosen to face.

It's a deliberate design choice, and follows traditional Unix security 
logic.  DAC permissions don't change on every file in the subtree when you 
mv directories, either.




- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread James Morris
On Fri, 15 Jun 2007, Seth Arnold wrote:

 The time for restorecon is probably best imagined as a kind of 'du' that
 also updates extended attributes as it does its work. It'd be very
 difficult to improve on this.

restorecon can most definitely be improved. 


- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Casey Schaufler

--- James Morris [EMAIL PROTECTED] wrote:

 On my system, it takes about 1.2 seconds to label a fully checked out 
 kernel source tree with ~23,000 files in this manner

That's an eternity for that many files to be improperly labeled.
If, and the if didn't originate with me, your policy is
demonstrably correct (how do you do that?) for all domains
you could claim that the action is safe, if not ideal. 
I can't say if an evaluation team would buy the safe
argument. They've been known to balk before.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread James Morris
On Fri, 15 Jun 2007, Casey Schaufler wrote:

 
 --- James Morris [EMAIL PROTECTED] wrote:
 
  On my system, it takes about 1.2 seconds to label a fully checked out 
  kernel source tree with ~23,000 files in this manner
 
 That's an eternity for that many files to be improperly labeled.
 If, and the if didn't originate with me, your policy is
 demonstrably correct (how do you do that?) for all domains
 you could claim that the action is safe, if not ideal. 
 I can't say if an evaluation team would buy the safe
 argument. They've been known to balk before.

To clarify:

We are discussing a scheme where the underlying SELinux labeling policy 
always ensures a safe label on a file, and then relabeling newly created 
files according to their pathnames.

There is no expectation that this scheme would be submitted for 
certification.  Its purpose is to merely to provide pathname-based 
labeling outside of the kernel.



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-15 Thread Greg KH
On Fri, Jun 15, 2007 at 09:21:57PM -0400, James Morris wrote:
 On Fri, 15 Jun 2007, Greg KH wrote:
 
  Oh great, then things like source code control systems would have no
  problems with new files being created under them, or renaming whole
  trees.
 
 It depends -- I think we may be talking about different things.
 
 If you're using inotify to watch for new files and kick something in 
 userspace to relabel them, it could take a while to relabel a lot of 
 files, although there are likely some gains to be had from parallel 
 relabeling which we've not explored.
 
 What I was saying is that you can use traditional SELinux labeling policy 
 underneath that to ensure that there is always a safe label on each file 
 before it is relabeled from userspace.

Ok, yes, I think we are in violent agreement here :)

  So, so much for the it's going to be too slow re-labeling everything
  issue, as it's not even required for almost all situations :)
 
 You could probably get an idea of the cost by running something like:
 
 $ time find /usr/src/linux | xargs setfattr -n user.foo -v bar
 
 On my system, it takes about 1.2 seconds to label a fully checked out 
 kernel source tree with ~23,000 files in this manner, on a stock standard 
 ext3 filesystem with a SATA drive.

Yeah, that should be very reasonable.  I'll wait to see Crispin's code
to work off of and see if I can get it to approach that kind of speed.

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html