Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-29 Thread Jeremy Higdon
On Mon, May 28, 2007 at 02:48:45PM +1000, Timothy Shimmin wrote:
 I'm taking it that the FUA write will just guarantee that that
 particular write has made it to disk on i/o completion
 (and no write cache flush is done).

Correct.  It only applies to that one write command.

jeremy
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-29 Thread Stefan Bader

 2007/5/25, Neil Brown [EMAIL PROTECTED]:
 BIO_RW_FAILFAST: means low-level driver shouldn't do much (or no)
 error recovery. Mainly used by mutlipath targets to avoid long SCSI
 recovery. This should just be propagated when passing requests on.

Is it much or no?
Would it be reasonable to use this for reads from a non-degraded
raid1?  What about writes?


This depends on the device driver's implementation. AFAIK there is no
fix rule how to handle that flag exactly. The SCSI driver seems to
omit internal recovery procedures but requests still can take as long
as the SCSI request time-out. I am not sure of all internals. Maybe
some error recovery is done as long as it shouldn't take very long.
For the DASD driver on zSeries this flags will only affect situations
when the driver decides there is no other way of succeeding. Recovery
is still done.
Using this flag was intended to move error handling to an upper layer
in the device stack. For multipathing it is good to be able to map a
request to another path instead of waiting until the SCSI layer
finally would give up with one path. For a RAID1 this might cause
requests to fail which would have been recovered. This might require
more error handling in md.
The error code as it is at this time doesn't say much in detail. I
once saw patches (and there are comments about a path missing from
Jens Axboe) to pass sense data (from SCSI) in the bio. I am not sure
whether this was dropped for some reason or just is in the pipe. Jens?

Stefan
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] AFS: Implement file locking

2007-05-29 Thread David Howells
J. Bruce Fields [EMAIL PROTECTED] wrote:

  At the moment, yes.  Don't the POSIX and flock lock-handling routines in the
  kernel normally do that anyway?
 
 No, they'd upgrade in that case.

I just checked.  The OpenAFS server supports neither lock upgrading nor lock
downgrading.  Attempts to do either incur an abort with code 0x02f6df0a
(which I believe to be equivalent to EAGAIN).

This means that I can't practically support lock upgrading.  Lock downgrading
I can emulate by handing apparent readlocks to local processes whilst holding
a writelock on the server.

David
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[1/4] 2.6.22-rc3: known regressions

2007-05-29 Thread Michal Piotrowski
Hi all,

Here is a list of some known regressions in 2.6.22-rc3.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions



Unclassified

Subject: long freezes on thinkpad t60
References : http://lkml.org/lkml/2007/5/24/100
Submitter  : Miklos Szeredi [EMAIL PROTECTED]
Handled-By : Ingo Molnar [EMAIL PROTECTED]
Status : problem is being debugged



ACPI

Subject: unable to shutdown on kernel 2.6.22-rc2
References : http://bugzilla.kernel.org/show_bug.cgi?id=8516
Submitter  : Thierry Volpiatto [EMAIL PROTECTED]
Status : Unknown



ALSA

Subject: snd-aoa causes badness in lib/kref.c:33
References : http://bugzilla.kernel.org/show_bug.cgi?id=8513
Submitter  : Ben Collins [EMAIL PROTECTED]
Status : Unknown



File systems

Subject: Oops in dentry_iput with 2.6.22-rc2 on AMD64
References : http://lkml.org/lkml/2007/5/22/4
Submitter  : Florin Iucha [EMAIL PROTECTED]
Status : Unknown



Kbuild

Subject: make M=$PWD modules_install does nothing
References : http://lkml.org/lkml/2007/5/27/190
Submitter  : Andrey Borzenkov [EMAIL PROTECTED]
Status : Unknown



Regards,
Michal

--
Najbardziej brakowało mi twojego milczenia.
-- Andrzej Sapkowski Coś więcej

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/4] 2.6.22-rc3: known regressions

2007-05-29 Thread Florin Iucha
On Tue, May 29, 2007 at 04:34:59PM +0200, Jan Kara wrote:
 On Tue 29-05-07 14:52:53, Michal Piotrowski wrote:
  Here is a list of some known regressions in 2.6.22-rc3.
  
  Subject: Oops in dentry_iput with 2.6.22-rc2 on AMD64
  References : http://lkml.org/lkml/2007/5/22/4
  Submitter  : Florin Iucha [EMAIL PROTECTED]
  Status : Unknown
   Actually, the bug seems to be unreproducible and it has probably been a
 1-bit flip. So I'd be reluctant to call it a regression...

I agree with this statement.  I'll ping Michal and Jan if the oops
resurfaces.

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


[PATCH] AFS: Implement file locking [try #2]

2007-05-29 Thread David Howells
Implement file locking for AFS.

[try #2]:

 (*) Start the lock manager thread under a mutex to avoid a race.

 (*) Made the locking non-fair: New readlocks will jump pending writelocks if
 there's a readlock currently granted on a file.  This makes the behaviour
 similar to Linux's VFS locking.

Regrading of locks is not currently supported as this is not supported by the
server.  Byte-range locking is also not currently supported for the same
reason.

Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/afs/Makefile|1 
 fs/afs/afs.h   |8 +
 fs/afs/afs_fs.h|3 
 fs/afs/callback.c  |3 
 fs/afs/dir.c   |1 
 fs/afs/file.c  |2 
 fs/afs/flock.c |  590 
 fs/afs/fsclient.c  |  155 ++
 fs/afs/internal.h  |   30 +++
 fs/afs/main.c  |1 
 fs/afs/misc.c  |1 
 fs/afs/super.c |3 
 fs/afs/vnode.c |  130 ++-
 include/linux/fs.h |4 
 14 files changed, 917 insertions(+), 15 deletions(-)

diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index 73ce561..a666710 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -8,6 +8,7 @@ kafs-objs := \
cmservice.o \
dir.o \
file.o \
+   flock.o \
fsclient.o \
inode.o \
main.o \
diff --git a/fs/afs/afs.h b/fs/afs/afs.h
index 2452579..c548aa3 100644
--- a/fs/afs/afs.h
+++ b/fs/afs/afs.h
@@ -37,6 +37,13 @@ typedef enum {
AFS_FTYPE_SYMLINK   = 3,
 } afs_file_type_t;
 
+typedef enum {
+   AFS_LOCK_READ   = 0,/* read lock request */
+   AFS_LOCK_WRITE  = 1,/* write lock request */
+} afs_lock_type_t;
+
+#define AFS_LOCKWAIT   (5 * 60) /* time until a lock times out 
(seconds) */
+
 /*
  * AFS file identifier
  */
@@ -120,6 +127,7 @@ struct afs_file_status {
struct afs_fid  parent; /* parent dir ID for non-dirs 
only */
time_t  mtime_client;   /* last time client changed 
data */
time_t  mtime_server;   /* last time server changed 
data */
+   s32 lock_count; /* file lock count (0=UNLK 
-1=WRLCK +ve=#RDLCK */
 };
 
 /*
diff --git a/fs/afs/afs_fs.h b/fs/afs/afs_fs.h
index a18c374..eb64732 100644
--- a/fs/afs/afs_fs.h
+++ b/fs/afs/afs_fs.h
@@ -31,6 +31,9 @@ enum AFS_FS_Operations {
FSGETVOLUMEINFO = 148,  /* AFS Get information about a volume */
FSGETVOLUMESTATUS   = 149,  /* AFS Get volume status information */
FSGETROOTVOLUME = 151,  /* AFS Get root volume name */
+   FSSETLOCK   = 156,  /* AFS Request a file lock */
+   FSEXTENDLOCK= 157,  /* AFS Extend a file lock */
+   FSRELEASELOCK   = 158,  /* AFS Release a file lock */
FSLOOKUP= 161,  /* AFS lookup file in directory */
FSFETCHDATA64   = 65537, /* AFS Fetch file data */
FSSTOREDATA64   = 65538, /* AFS Store file data */
diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index bacf518..b824394 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -125,6 +125,9 @@ static void afs_break_callback(struct afs_server *server,
spin_unlock(server-cb_lock);
 
queue_work(afs_callback_update_worker, vnode-cb_broken_work);
+   if (list_empty(vnode-granted_locks) 
+   !list_empty(vnode-pending_locks))
+   afs_lock_may_be_available(vnode);
spin_unlock(vnode-lock);
}
 }
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 546c595..33fe39a 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -44,6 +44,7 @@ const struct file_operations afs_dir_file_operations = {
.open   = afs_dir_open,
.release= afs_release,
.readdir= afs_readdir,
+   .lock   = afs_lock,
 };
 
 const struct inode_operations afs_dir_inode_operations = {
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 1547500..8aaa233 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -35,6 +35,8 @@ const struct file_operations afs_file_operations = {
.mmap   = afs_mmap,
.sendfile   = generic_file_sendfile,
.fsync  = afs_fsync,
+   .lock   = afs_lock,
+   .flock  = afs_flock,
 };
 
 const struct inode_operations afs_file_inode_operations = {
diff --git a/fs/afs/flock.c b/fs/afs/flock.c
new file mode 100644
index 000..bb97105
--- /dev/null
+++ b/fs/afs/flock.c
@@ -0,0 +1,590 @@
+/* AFS file locking support
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([EMAIL PROTECTED])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+

[PATCH] add procfs tunable to enable immediate panic when there are busy inodes after umount

2007-05-29 Thread Jeff Layton
After spending quite a bit of time tracking down a VFS: busy inodes
after unmount problem, it occurs to me that it would be nice to be
able to force a panic when that occurs. While an oops message alone is
not generally helpful for tracking down this sort of problem,
collecting and analyzing a coredump when this occurs can be.

The following patch adds a procfs tunable that allows you to force a
core when a busy inodes after umount problem occurs. It also changes
the classic error message to be something a bit less cryptic to users.

Signed-off-by: Jeff Layton [EMAIL PROTECTED]

diff --git a/fs/block_dev.c b/fs/block_dev.c
diff --git a/fs/inode.c b/fs/inode.c
index 9a012cc..0e638b0 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -327,7 +327,7 @@ static int invalidate_list(struct list_head *head, struct 
list_head *dispose)
count++;
continue;
}
-   busy = 1;
+   ++busy;
}
/* only unused inodes may be cached with i_count zero */
inodes_stat.nr_unused -= count;
diff --git a/fs/super.c b/fs/super.c
index 5260d62..9c2871b 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -287,6 +287,8 @@ int fsync_super(struct super_block *sb)
 void generic_shutdown_super(struct super_block *sb)
 {
const struct super_operations *sop = sb-s_op;
+   extern int umount_debug;
+   int busy;
 
if (sb-s_root) {
shrink_dcache_for_umount(sb);
@@ -303,10 +305,15 @@ void generic_shutdown_super(struct super_block *sb)
sop-put_super(sb);
 
/* Forget any remaining inodes */
-   if (invalidate_inodes(sb)) {
-   printk(VFS: Busy inodes after unmount of %s. 
-  Self-destruct in 5 seconds.  Have a nice day...\n,
-  sb-s_id);
+   if (busy = invalidate_inodes(sb)) {
+   printk(VFS: %d busy inodes after unmount of %s. ,
+busy, sb-s_id);
+   if (umount_debug != 0) {
+   printk(Crashing host on request.\n);
+   BUG();
+   } else {
+   printk(This machine will likely crash 
eventually. Consider a reboot.\n);
+   }
}
 
unlock_kernel();
diff --git a/include/linux/fs.h b/include/linux/fs.h
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 47f1c53..176b984 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -818,6 +818,7 @@ enum
FS_AIO_NR=18,   /* current system-wide number of aio requests */
FS_AIO_MAX_NR=19,   /* system-wide maximum number of aio requests */
FS_INOTIFY=20,  /* inotify submenu */
+   FS_UMOUNT_DEBUG=21, /* busy inodes on umount debug switch */
FS_OCFS2=988,   /* ocfs2 */
 };
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 30ee462..8e62c34 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -156,6 +156,7 @@ extern ctl_table pty_table[];
 extern ctl_table inotify_table[];
 #endif
 
+int umount_debug;
 #ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
 int sysctl_legacy_va_layout;
 #endif
@@ -962,6 +963,14 @@ static ctl_table fs_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec,
},
+   {
+   .ctl_name   = FS_UMOUNT_DEBUG,
+   .procname   = umount_debug,
+   .data   = umount_debug,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   },
 #ifdef CONFIG_DNOTIFY
{
.ctl_name   = FS_DIR_NOTIFY,
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add procfs tunable to enable immediate panic when there are busy inodes after umount

2007-05-29 Thread Alexey Dobriyan
On Tue, May 29, 2007 at 11:40:42AM -0400, Jeff Layton wrote:
 After spending quite a bit of time tracking down a VFS: busy inodes
 after unmount problem, it occurs to me that it would be nice to be
 able to force a panic when that occurs. While an oops message alone is
 not generally helpful for tracking down this sort of problem,
 collecting and analyzing a coredump when this occurs can be.

 The following patch adds a procfs tunable that allows you to force a
 core when a busy inodes after umount problem occurs. It also changes
 the classic error message to be something a bit less cryptic to users.

 @@ -303,10 +305,15 @@ void generic_shutdown_super(struct super_block *sb)
   sop-put_super(sb);

   /* Forget any remaining inodes */
 - if (invalidate_inodes(sb)) {
 - printk(VFS: Busy inodes after unmount of %s. 
 -Self-destruct in 5 seconds.  Have a nice day...\n,
 -sb-s_id);
 + if (busy = invalidate_inodes(sb)) {
 + printk(VFS: %d busy inodes after unmount of %s. ,
 +  busy, sb-s_id);
 + if (umount_debug != 0) {
 + printk(Crashing host on request.\n);
 + BUG();
 + } else {
 + printk(This machine will likely crash 
 eventually. Consider a reboot.\n);
 + }

You can add just BUG_ON here and do

echo 1 /proc/sys/kernel/panic_on_oops

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2] i_version update - ext4 part

2007-05-29 Thread Mingming Cao
On Fri, 2007-05-25 at 18:25 +0200, Jean noel Cordenner wrote:
 The patch is on top of the ext4 tree:
 http://repo.or.cz/w/ext4-patch-queue.git
 
 In this part, the i_version counter is stored into 2 32bit fields of
 the ext4_inode structure osd1.linux1.l_i_version and i_version_hi.
 
 I included the ext4_expand_inode_extra_isize patch, which does part of 
 the job, checking if there is enough room for extra fields in the inode 
 (i_version_hi). The other patch increments the counter on inode 
 modifications and set it on inode creation.
 plain text document attachment (i_version_update_ext4)
 This patch is on top of i_version_update_vfs.
 The i_version field of the inode is set on inode creation and incremented when
 the inode is being modified.
 

I am a little bit confused about the two patches. 

It appears in the ext4_expand_inode_extra_isize patch by Kalpak, there a
new 64 bit i_fs_version field is added to ext4 inode structure for inode
versioning support. read/store of this counter are properly handled, but
missing the inode versioning counter update.

But later in the second patch by Jean Noel, he re-used the VFS inode-
i_version for ext4 inode versioning, the counter is being updated every
time the file is being changed. 

To me, i_fs_version and inode_version are the same thing, right?
Shouldn't we choose one(I assume inode i_version?), and combine these
two patch together? How about split the inode versioning part from the
ext4_expand_inode_extra_isize patch(it does multiple things, and
i_versioning doesn't longs there) and put it together with the rest of
i_version update patches?


BTW, how could NFS/user space to access the inode version counter?

Thanks,
Mingming


 Signed-off-by: Jean Noel Cordenner [EMAIL PROTECTED]
 
 Index: linux-2.6.22-rc2-ext4-1/fs/ext4/ialloc.c
 ===
 --- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/ialloc.c 2007-05-25 
 18:05:28.0 +0200
 +++ linux-2.6.22-rc2-ext4-1/fs/ext4/ialloc.c  2007-05-25 18:05:40.0 
 +0200
 @@ -565,6 +565,7 @@
   inode-i_blocks = 0;
   inode-i_mtime = inode-i_atime = inode-i_ctime = ei-i_crtime =
  ext4_current_time(inode);
 + inode-i_version = 1;
 
   memset(ei-i_data, 0, sizeof(ei-i_data));
   ei-i_dir_start_lookup = 0;
 Index: linux-2.6.22-rc2-ext4-1/fs/ext4/inode.c
 ===
 --- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/inode.c  2007-05-25 
 18:05:28.0 +0200
 +++ linux-2.6.22-rc2-ext4-1/fs/ext4/inode.c   2007-05-25 18:05:40.0 
 +0200
 @@ -3082,6 +3082,7 @@
  {
   int err = 0;
 
 + inode-i_version++;
   /* the do_update_inode consumes one bh-b_count */
   get_bh(iloc-bh);
 
 Index: linux-2.6.22-rc2-ext4-1/fs/ext4/super.c
 ===
 --- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/super.c  2007-05-25 
 18:05:28.0 +0200
 +++ linux-2.6.22-rc2-ext4-1/fs/ext4/super.c   2007-05-25 18:05:40.0 
 +0200
 @@ -2839,8 +2839,8 @@
   i_size_write(inode, off+len-towrite);
   EXT4_I(inode)-i_disksize = inode-i_size;
   }
 - inode-i_version++;
   inode-i_mtime = inode-i_ctime = CURRENT_TIME;
 + inode-i_version = 1;
   ext4_mark_inode_dirty(handle, inode);
   mutex_unlock(inode-i_mutex);
   return len - towrite;

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add procfs tunable to enable immediate panic when there are busy inodes after umount

2007-05-29 Thread Jeff Layton
On Tue, 29 May 2007 23:38:13 +0400
Alexey Dobriyan [EMAIL PROTECTED] wrote:

 On Tue, May 29, 2007 at 11:40:42AM -0400, Jeff Layton wrote:
  After spending quite a bit of time tracking down a VFS: busy inodes
  after unmount problem, it occurs to me that it would be nice to be
  able to force a panic when that occurs. While an oops message alone is
  not generally helpful for tracking down this sort of problem,
  collecting and analyzing a coredump when this occurs can be.
 
  The following patch adds a procfs tunable that allows you to force a
  core when a busy inodes after umount problem occurs. It also changes
  the classic error message to be something a bit less cryptic to users.
 
  @@ -303,10 +305,15 @@ void generic_shutdown_super(struct super_block *sb)
  sop-put_super(sb);
 
  /* Forget any remaining inodes */
  -   if (invalidate_inodes(sb)) {
  -   printk(VFS: Busy inodes after unmount of %s. 
  -  Self-destruct in 5 seconds.  Have a nice day...\n,
  -  sb-s_id);
  +   if (busy = invalidate_inodes(sb)) {
  +   printk(VFS: %d busy inodes after unmount of %s. ,
  +busy, sb-s_id);
  +   if (umount_debug != 0) {
  +   printk(Crashing host on request.\n);
  +   BUG();
  +   } else {
  +   printk(This machine will likely crash 
  eventually. Consider a reboot.\n);
  +   }
 
 You can add just BUG_ON here and do
 
   echo 1 /proc/sys/kernel/panic_on_oops
 

I certainly could, but the problem is that there's little point in
panicing immediately here if you can't collect a coredump. Oops
messages aren't very helpful for tracking this sort of thing down, so
I'd think we want the BUG() conditional on something more granular
than panic_on_oops.

-- 
Jeff Layton [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-29 Thread Phillip Susi

Neil Brown wrote:

 md/dm modules could keep count of requests as has been suggested
 (though that would be a fairly big change for raid0 as it currently
 doesn't know when a request completes - bi_endio goes directly to the
 filesystem). 


Are you sure?  I believe that dm handles bi_endio because it waits for 
all in progress bio to complete before switching tables.



2/ Maybe barriers provide stronger semantics than are required.

 All write requests are synchronised around a barrier write.  This is
 often more than is required and apparently can cause a measurable
 slowdown.


I'm not quite sure I understand this correctly, but the purpose of a 
barrier request is to prevent the elevator from reordering requests 
around a barrier.  Previous requests must be completed before the 
barrier, and latter requests must be executed after.  That is a 
sufficiently strong guarantee for careful write or journal filesystems 
to ensure that a log block hits the disk before the actual transaction 
blocks, and then the log block is marked as complete only after the 
actual transaction.  This is a weaker guarantee than a flush, and allows 
for some reordering to improve performance.



 Also the FUA for the actual commit write might not be needed.  It is
 important for consistency that the preceding writes are in safe
 storage before the commit write, but it is not so important that the
 commit write is immediately safe on storage.  That isn't needed until
 a 'sync' or 'fsync' or similar.


Right, the barrier doesn't need to be flushed right away, so the 
elevator could complete writes after the barrier if it wishes, then 
complete the ones before, and finally the barrier itself.  Not setting 
the FUA bit allows the disk to cache the barrier write so it can be 
completed sooner, but before the queue sends any more requests to the 
disk, it must be flushed to ensure that the barrier has hit the media 
before the new requests.



 One possible alternative is:
   - writes can overtake barriers, but barrier cannot overtake writes.
   - flush before the barrier, not after.

 This is considerably weaker, and hence cheaper. But I think it is
 enough for all filesystems (providing it is still an option to call
 blkdev_issue_flush on 'fsync').


Again I am not sure I quite understand what you mean here, but only 
writes issued after the barrier can complete before the barrier.  Those 
issued before the barrier can not overtake it in the queue.



 Another alternative would be to tag each bio was being in a
 particular barrier-group.  Then bio's in different groups could
 overtake each other in either direction, but a BARRIER request must
 be totally ordered w.r.t. other requests in the barrier group.
 This would require an extra bio field, and would give the filesystem
 more appearance of control.  I'm not yet sure how much it would
 really help...
 It would allow us to set FUA on all bios with a non-zero
 barrier-group.  That would mean we don't have to flush the entire
 cache, just those blocks that are critical but I'm still not sure
 it's a good idea.


This all seems unnecessary work.


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-29 Thread Phillip Susi

David Chinner wrote:

Sounds good to me, but how do we test to see if the underlying
device supports barriers? Do we just assume that they do and
only change behaviour if -o nobarrier is specified in the mount
options?


The idea is that ALL block devices will support barriers; if the 
underlying driver doesn't, then the block layer will work around it.



The use of barriers in XFS assumes the commit write to be on stable
storage before it returns.  One of the ordering guarantees that we
need is that the transaction (commit write) is on disk before the
metadata block containing the change in the transaction is written
to disk and the current barrier behaviour gives us that.


Barrier != synchronous write, so if XFS relies on that block being on 
the media when the request is completed, then it is broken.  It should 
only care that the ordering of log-data-log is maintained, not exactly 
when each specific request completes.



-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] AFS: Implement file locking

2007-05-29 Thread J. Bruce Fields
On Tue, May 29, 2007 at 10:34:41AM +0100, David Howells wrote:
 I'll need to test the upgrade/downgrade case.  I don't know whether the AFS
 server supports that.  If it doesn't, I can emulate downgrade, but not upgrade
 - not unless I only ever ask it for exclusive locks.
 
 Lock upgrading is really, really easy to contrive deadlock for.

Any such deadlock is the user's fault.

But, right, I agree that upgrades are probably hard to use correctly.
And that implementing them shouldn't be a priority in the case of AFS.
Just as long as the implementation doesn't completely fall over when
somebody attempts an upgrade.

--b.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/13] NFS: Add functions to parse nfs mount options to fs/nfs/super.c

2007-05-29 Thread Karel Zak
On Mon, May 21, 2007 at 12:09:54PM -0400, Chuck Lever wrote:
 For NFSv2 and NFSv3 mount options.
 Signed-off-by: Chuck Lever [EMAIL PROTECTED]

 

 +static int nfs_parse_options(char *raw, struct nfs_mount_args *mnt)
 +{
 + char *p, *string;
 +
 + if (!raw) {
 + dprintk(NFS: mount options string was NULL.\n);
 + return 1;
 + }
 +
 + while ((p = strsep (raw, ,)) != NULL) {
 + substring_t args[MAX_OPT_ARGS];
 + int option, token;
 +
 + if (!*p)
 + continue;
 + token = match_token(p, nfs_tokens, args);

 

 +
 + case Opt_context:
 + match_strcpy(mnt-nmd.context, args);
 + break;

 The userspace version (nfs-utils) of this code supports a quoted
 context strings. For example:

context=aaa,bbb,ccc,hard

 It seems your code blindly parses a raw option string by ,.

Karel

-- 
 Karel Zak  [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/13] NFS: Add functions to parse nfs mount options to fs/nfs/super.c

2007-05-29 Thread Chuck Lever

Karel Zak wrote:

On Mon, May 21, 2007 at 12:09:54PM -0400, Chuck Lever wrote:

For NFSv2 and NFSv3 mount options.
Signed-off-by: Chuck Lever [EMAIL PROTECTED]


 


+static int nfs_parse_options(char *raw, struct nfs_mount_args *mnt)
+{
+   char *p, *string;
+
+   if (!raw) {
+   dprintk(NFS: mount options string was NULL.\n);
+   return 1;
+   }
+
+   while ((p = strsep (raw, ,)) != NULL) {
+   substring_t args[MAX_OPT_ARGS];
+   int option, token;
+
+   if (!*p)
+   continue;
+   token = match_token(p, nfs_tokens, args);


 


+
+   case Opt_context:
+   match_strcpy(mnt-nmd.context, args);
+   break;


 The userspace version (nfs-utils) of this code supports a quoted
 context strings. For example:

context=aaa,bbb,ccc,hard

 It seems your code blindly parses a raw option string by ,.


Karel-

I've never used the context= option, and didn't find any documentation 
describing how it was used.


Is there a clean example of how to use the in-kernel parser to handle 
quoted strings containing commas?
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel/
version:2.1
end:vcard



Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-29 Thread Valdis . Kletnieks
On Mon, 28 May 2007 21:54:46 EDT, Kyle Moffett said:

 Average users are not supposed to be writing security policy.  To be  
 honest, even average-level system administrators should not be  
 writing security policy.  It's OK for such sysadmins to tweak  
 existing policy to give access to additional web-docs or such, but  
 only expert sysadmin/developers or security professionals should be  
 writing security policy.  It's just too damn easy to get completely  
 wrong.

The single biggest challenge in computer security at the present time is how to
build *and deploy* servers that stay reasonably secure even when run by the
average wave-a-dead-chicken sysadmin, and desktop-class boxes that can survive
the best attempts of Joe Sixpack's Ooh shiny reflex, and Joe's kid's attempts
to evade the nannyware that Joe had somebody install.

(If you know how to build such things, don't bother replying.  If you have
actual field experience on getting significant percents of Joe Sixpacks to
switch, I need to buy you a beer or something.. ;)





pgp2xhvoFOnBw.pgp
Description: PGP signature


Re: [PATCH] AFS: Implement file locking [try #2]

2007-05-29 Thread J. Bruce Fields
One more vague question I had while skimming the previous version--

On Tue, May 29, 2007 at 03:54:27PM +0100, David Howells wrote:
 +static void afs_grant_locks(struct afs_vnode *vnode, struct file_lock *fl)
 +{
 + struct file_lock *p, *_p;
 +
 + list_move_tail(fl-fl_u.afs.link, vnode-granted_locks);
 + if (fl-fl_type == F_RDLCK) {
 + list_for_each_entry_safe(p, _p, vnode-pending_locks,
 +  fl_u.afs.link) {
 + if (p-fl_type == F_RDLCK) {
 + p-fl_u.afs.state = AFS_LOCK_GRANTED;
 + list_move_tail(p-fl_u.afs.link,
 +vnode-granted_locks);
 + wake_up(p-fl_wait);
 + }
 + }
 + }
 +}

--without having tried to understand how they're actually used, these
data structures (like the pending_locks and granted_locks lists) seem to
duplicate stuff that's already kept in fs/locks.c.  Is there a reason
they're required?

--b.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-29 Thread Alasdair G Kergon
On Tue, May 29, 2007 at 11:25:42AM +0200, Stefan Bader wrote:
 doing a sort of suspend, issuing the
 barrier request, calling flush to all mapped devices and then wait for
 in-flight I/O to go to zero? 

Something like that is needed for some dm targets to support barriers.
(We needn't always wait for *all* in-flight I/O.)
When faced with -EOPNOTSUP, do all callers fall back to a sync in
the places a barrier would have been used, or are there any more
sophisticated strategies attempting to optimise code without barriers?

 I am not a hundred percent sure about
 that but I think that just passing the barrier flag on to mapped
 devices might in some (maybe they are rare) cases cause a layer above
 to think all data is on-disk while this isn't necessarily true (see my
 previous post). What do you think?

An efficient I/O barrier implementation would not normally involve
flushing AFAIK: dm surely wouldn't cause a higher layer to assume
stronger semantics than are provided.
 
Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2] i_version update - ext4 part

2007-05-29 Thread Andreas Dilger
On May 29, 2007  12:44 -0700, Mingming Cao wrote:
 I am a little bit confused about the two patches. 
 
 It appears in the ext4_expand_inode_extra_isize patch by Kalpak, there a
 new 64 bit i_fs_version field is added to ext4 inode structure for inode
 versioning support. read/store of this counter are properly handled, but
 missing the inode versioning counter update.

For the Lustre use of the inode version we don't care about the VFS changes
to i_version.  In fact - we want to be able to control the changes to
inode version ourselves so that e.g. file defragmenting or atime updates
don't change the inode version, and that recovery can restore the version
to a known state along with the rest of the metadata.

That said, since Lustre isn't in the kernel and we patch our version of
ext3 anyways it doesn't really matter what is done for NFS.  We will just
patch in our own behaviour if the final ext4 code isn't suitable in all
of the details.  Having 99% of the code the same at least makes this a
lot less work.

 But later in the second patch by Jean Noel, he re-used the VFS inode-
 i_version for ext4 inode versioning, the counter is being updated every
 time the file is being changed. 

I don't know what the NFS requirements for the version are.  There may
also be some complaints from others if the i_version is 64 bits because
this contributes to generic inode growth and isn't used for other
filesystems.

 To me, i_fs_version and inode_version are the same thing, right?
 Shouldn't we choose one(I assume inode i_version?), and combine these
 two patch together? How about split the inode versioning part from the
 ext4_expand_inode_extra_isize patch(it does multiple things, and
 i_versioning doesn't longs there) and put it together with the rest of
 i_version update patches?

I don't have an objection to that, but I don't think it is required.

 BTW, how could NFS/user space to access the inode version counter?

If the Bull patch uses i_version then knfsd can just access it directly.
I don't think there is any API to access it from userspace.  One option
is to add a virtual EA like user.inode_version and have the kernel fill
this in from i_version.

Lustre will manipulate the ei-i_fs_version directly.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-29 Thread David Chinner
On Tue, May 29, 2007 at 04:03:43PM -0400, Phillip Susi wrote:
 David Chinner wrote:
 The use of barriers in XFS assumes the commit write to be on stable
 storage before it returns.  One of the ordering guarantees that we
 need is that the transaction (commit write) is on disk before the
 metadata block containing the change in the transaction is written
 to disk and the current barrier behaviour gives us that.
 
 Barrier != synchronous write,

Of course. FYI, XFS only issues barriers on *async* writes.

But barrier semantics - as far as they've been described by everyone
but you indicate that the barrier write is guaranteed to be on stable
storage when it returns.

 so if XFS relies on that block being on 
 the media when the request is completed, then it is broken.

XFS relies on the block being stable before any other write
goes to disk. That is the semantic that the barrier I/Os currently
have. How that is implemented in the device is irrelevant to me,
but if I issue a barrier I/O, I do not expect *any* I/O to be
reordered around it.

 It should 
 only care that the ordering of log-data-log is maintained, not exactly 
 when each specific request completes.

Yes, and that is provided to XFS by the fact that barrier I/Os are
full barriers

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-29 Thread david

On Wed, 30 May 2007, David Chinner wrote:


On Tue, May 29, 2007 at 04:03:43PM -0400, Phillip Susi wrote:

David Chinner wrote:

The use of barriers in XFS assumes the commit write to be on stable
storage before it returns.  One of the ordering guarantees that we
need is that the transaction (commit write) is on disk before the
metadata block containing the change in the transaction is written
to disk and the current barrier behaviour gives us that.


Barrier != synchronous write,


Of course. FYI, XFS only issues barriers on *async* writes.

But barrier semantics - as far as they've been described by everyone
but you indicate that the barrier write is guaranteed to be on stable
storage when it returns.


this doesn't match what I have seen

wtih barriers it's perfectly legal to have the following sequence of 
events


1. app writes block 10 to OS
2. app writes block 4 to OS
3. app writes barrier to OS
4. app writes block 5 to OS
5. app writes block 20 to OS
6. OS writes block 4 to disk drive
7. OS writes block 10 to disk drive
8. OS writes barrier to disk drive
9. OS writes block 5 to disk drive
10. OS writes block 20 to disk drive
11. disk drive writes block 10 to platter
12. disk drive writes block 4 to platter
13. disk drive writes block 20 to platter
14. disk drive writes block 5 to platter

there is nothing that says that when the app finishes step #3 that the OS 
has even sent the data to the drive, let alone that the drive has flushed 
it to a platter


if the disk drive doesn't support barriers then step #8 becomes 'issue 
flush' and steps 11 and 12 take place before step #9, 13, 14


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/13] NFS: Add functions to parse nfs mount options to fs/nfs/super.c

2007-05-29 Thread Karel Zak
On Tue, May 29, 2007 at 05:08:01PM -0400, Chuck Lever wrote:
 Karel Zak wrote:
 On Mon, May 21, 2007 at 12:09:54PM -0400, Chuck Lever wrote:
 For NFSv2 and NFSv3 mount options.
 Signed-off-by: Chuck Lever [EMAIL PROTECTED]
 
  
 
 +static int nfs_parse_options(char *raw, struct nfs_mount_args *mnt)
 +{
 +   char *p, *string;
 +
 +   if (!raw) {
 +   dprintk(NFS: mount options string was NULL.\n);
 +   return 1;
 +   }
 +
 +   while ((p = strsep (raw, ,)) != NULL) {
 +   substring_t args[MAX_OPT_ARGS];
 +   int option, token;
 +
 +   if (!*p)
 +   continue;
 +   token = match_token(p, nfs_tokens, args);
 
  
 
 +
 +   case Opt_context:
 +   match_strcpy(mnt-nmd.context, args);
 +   break;
 
  The userspace version (nfs-utils) of this code supports a quoted
  context strings. For example:
 
 context=aaa,bbb,ccc,hard
 
  It seems your code blindly parses a raw option string by ,.
 
 Karel-
 
 I've never used the context= option, and didn't find any documentation 
 describing how it was used.

 That's SELinux stuff. See original discussion:

 http://thread.gmane.org/gmane.linux.redhat.security.lspp/1002/focus=1004

 There are also fscontext, defcontext and context for normal (non-NFS)
 mounts. See the mount.8 patch (where is basic docs):

 
http://git.kernel.org/?p=utils/util-linux-ng/util-linux-ng.git;a=blobdiff;f=mount/mount.8;h=8ed5a11b77985c8da2dcac4602a67f8785a95070;hp=4692a42b3487b8e0db6dc0b7d17cfd214e8aefc8;hb=3a620ba4bffade41d81c429560c40bb65c9b81a7;hpb=6573c985a4077fa7d50ccb993bae177526fde8ec
 
 Is there a clean example of how to use the in-kernel parser to handle 
 quoted strings containing commas?

 Not sure.

 It was introduced by [PATCH] SELinux: support mls categories for context
 mounts:

 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3528a95322b5c1ce882ab723f175a1845430cd89

 The SELinux specific options are extracted from mount options by the
 sb_copy_data hook (fs/super.c, vfs_kern_mount()) -- that's probably
 transparent for all filesystems, maybe for your NFS options too. (I
 didn't study it in detail.)

Karel

-- 
 Karel Zak  [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add procfs tunable to enable immediate panic when there are busy inodes after umount

2007-05-29 Thread David Chinner
On Tue, May 29, 2007 at 11:40:42AM -0400, Jeff Layton wrote:
 After spending quite a bit of time tracking down a VFS: busy inodes
 after unmount problem, it occurs to me that it would be nice to be
 able to force a panic when that occurs. While an oops message alone is
 not generally helpful for tracking down this sort of problem,
 collecting and analyzing a coredump when this occurs can be.

Agreed - we've found that we've had roughly 50% success in finding
the cause of these problems from crash dumps triggered immediately
like this vs ~0% from a crash that occurred some time later.

Given that this problem will always result in a crash of the kernel
at some random time in the future, why don't we just make this error
an unconditional panic on get the crash over and done with?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-29 Thread Toshiharu Harada

2007/5/29, Kyle Moffett [EMAIL PROTECTED]:

 But writing policy with labels are somewhat indirect way (I mean,
 we need ls -Z or ps -Z).  Indirect way can cause flaw so we
 need a lot of work that is what I wanted to tell.

 I don't really use ls -Z or ps -Z when writing SELinux policy; I
 do that only when I actually think I mislabeled files.

 I believe what you wrote, but it may not be as easy for average
 Linux users.

As I said before, average Linux users should not be writing their own
security policy.  I have yet to meet an average Linux user who
could reliably quote for me what the file permissions on the /tmp
directory should be, or what the sticky bit was.  A small percentage
of average Linux system administrators don't get that right
consistently, and if you don't understand the sticky bit then you
should *definitely* not be controlling program permissions on a per-
syscall basis.


Thank you for your detailed and thoughtful explanation.
Things are much clear now for me. Although your explanation was
quite persuasive, I still have some concerns.

Linux is now being used literately everywhere. As devices, technologies and
Linux itself is evolving so quickly, I'm afraid the way you showed was right
but could never meet the every goal perfectly. So some areas, including
embedded and special distro I guess, there must be solutions and help  for
average level administrators.

I think there are two ways to make secure systems.  One is just
you wrote: ask it professionals way, the other is making practices.
You might ask how? My answer to the question is pahtname-based
systems such as AppAmor and TOMOYO Linux.
They can't be compared to SELinux, but they should be considered to
supplemental tools.  At least they are helpful to analyze how Linux works.
Tweeking SELinux policy is not easy but writing policies for
them is relatively easy (I'm not talking about security here).

Not everybody can be a professional administrators, but he/she can be a
professional administrator of his/her system.  I believe there must be
solutions for non professional administrators.  That's why we developed
TOMOYO Linux (http://tomoyo.sourceforge.jp/) and so was AppArmor
I guess.  You might laugh, but we are doing this because we want to
contribute to Linux and its community. :)

Thanks,
Toshiharu Harada
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-05-29 Thread Nick Piggin
On Tue, May 29, 2007 at 02:19:55PM -0700, Andrew Morton wrote:
 
 The patch titled
  fs: introduce write_begin, write_end, and perform_write aops
 has been added to the -mm tree.  Its filename is
  fs-introduce-write_begin-write_end-and-perform_write-aops.patch
 
 *** Remember to use Documentation/SubmitChecklist when testing your code ***
 
 See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
 out what to do about this
 
 --
 Subject: fs: introduce write_begin, write_end, and perform_write aops
 From: Nick Piggin [EMAIL PROTECTED]
 
 These are intended to replace prepare_write and commit_write with more
 flexible alternatives that are also able to avoid the buffered write
 deadlock problems efficiently (which prepare_write is unable to do).

OK, well now Andrew's merged a significant chunk of this work, I
would like to try getting the clustered filesystem patches back
in too (Steven, the last GFS2 patch you sent had rejects against this
tree, so I dropped it... hope it isn't too much work to bring it back
uptodate?).

The cluster filesystems aren't 100% happy with the backward-compat
code, because pagecache_write_end cannot handle AOP_TRUNCATED_PAGE from
-commit_write... so if you were to try using loop over GFS2, it might
go BUG. This is a bit bad of me, however the compat code would have been
a whole lot uglier to support that, and I figure the cluster filesystems
want to convert to the new aops ASAP anyway.

I doubt anybody but the filesystem developers would be using -mm in such
a way, but even so I hope we can fix this before long.

Meanwhile, I'll look at redoing the rest of the filesystems that got
left behind.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-05-29 Thread Andrew Morton
On Wed, 30 May 2007 05:13:54 +0200 Nick Piggin [EMAIL PROTECTED] wrote:

 On Tue, May 29, 2007 at 02:19:55PM -0700, Andrew Morton wrote:
  
  The patch titled
   fs: introduce write_begin, write_end, and perform_write aops
  has been added to the -mm tree.  Its filename is
   fs-introduce-write_begin-write_end-and-perform_write-aops.patch
  
  *** Remember to use Documentation/SubmitChecklist when testing your code ***
  
  See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
  out what to do about this
  
  --
  Subject: fs: introduce write_begin, write_end, and perform_write aops
  From: Nick Piggin [EMAIL PROTECTED]
  
  These are intended to replace prepare_write and commit_write with more
  flexible alternatives that are also able to avoid the buffered write
  deadlock problems efficiently (which prepare_write is unable to do).
 
 OK, well now Andrew's merged a significant chunk of this work, I
 would like to try getting the clustered filesystem patches back
 in too (Steven, the last GFS2 patch you sent had rejects against this
 tree, so I dropped it... hope it isn't too much work to bring it back
 uptodate?).

 The cluster filesystems aren't 100% happy with the backward-compat
 code, because pagecache_write_end cannot handle AOP_TRUNCATED_PAGE from
 -commit_write... so if you were to try using loop over GFS2, it might
 go BUG. This is a bit bad of me, however the compat code would have been
 a whole lot uglier to support that, and I figure the cluster filesystems
 want to convert to the new aops ASAP anyway.
 
 I doubt anybody but the filesystem developers would be using -mm in such
 a way, but even so I hope we can fix this before long.
 
 Meanwhile, I'll look at redoing the rest of the filesystems that got
 left behind.

hm, I suppose that means I need to undrop git-ocfs2.patch.  It has a mild
disagreeement with the fault-vs-invalidate patches which I didn't feel like
fixing.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/4] 2.6.22-rc3: known regressions

2007-05-29 Thread Sam Ravnborg
On Tue, May 29, 2007 at 02:52:53PM +0200, Michal Piotrowski wrote:
 Hi all,
 
 Here is a list of some known regressions in 2.6.22-rc3.
 
 
 Kbuild
 
 Subject: make M=$PWD modules_install does nothing
 References : http://lkml.org/lkml/2007/5/27/190
 Submitter  : Andrey Borzenkov [EMAIL PROTECTED]
 Status : Unknown
Closed - see http://lkml.org/lkml/2007/5/29/497

Sam
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 01/41] Pass struct vfsmount to the inode_create LSM hook

2007-05-29 Thread Crispin Cowan
[EMAIL PROTECTED] wrote:
 On Mon, 28 May 2007 21:54:46 EDT, Kyle Moffett said:
   
 Average users are not supposed to be writing security policy.  To be  
 honest, even average-level system administrators should not be  
 writing security policy.
That explains so much! SELinux: you're too dumb to use it, so just keep
your hands in your pockets. :-)

AppArmor was designed to allow your average sys admin to write a
security policy. It makes different design choices than SELinux to
achieve that goal. As a result, AppArmor is an utter failure when
compared to SELinux's goals, and SELinux in turn is an utter failure
when compared to AppArmor's goals.

Which is why we have LSM: so we don't have to have this argument here,
again.

   It's OK for such sysadmins to tweak  
 existing policy to give access to additional web-docs or such, but  
 only expert sysadmin/developers or security professionals should be  
 writing security policy.  It's just too damn easy to get completely  
 wrong.
 
 The single biggest challenge in computer security at the present time is how 
 to
 build *and deploy* servers that stay reasonably secure even when run by the
 average wave-a-dead-chicken sysadmin, and desktop-class boxes that can survive
 the best attempts of Joe Sixpack's Ooh shiny reflex, and Joe's kid's 
 attempts
 to evade the nannyware that Joe had somebody install.
   
That is a tall order. You can mostly achieve it by not giving the user
the root password, but I'm not sure you would like the result :-)

Both SELinux and AppArmor can be configured so tightly that you are not
going to get to install malware, by preventing the user from installing
software. This isn't what users want, so they invariably bypass security
and install shiny things if they own the box. SELinux and AppArmor can't
help but fail if you put them in that kind of harm's way.

Crispin

-- 
Crispin Cowan, Ph.D.   http://crispincowan.com/~crispin/
Director of Software Engineering   http://novell.com
   Security: It's not linear

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html