Re: 2.6.23-mm1: BUG in reiserfs_delete_xattrs
On Mon, Oct 15, 2007 at 12:34:58AM +0200, Laurent Riffard wrote: reiserfs_delete_xattrs reiserfs_delete_inode generic_delete_inode generic_drop_inode iput do_unlinkat sys_unlink sys_enter_past_esp I reported a similar BUG in 2.6.22-rc8-mm2 (see http://lkml.org/lkml/2007/9/27/235). Dave Hansen sent a patch for it, I tested it and it was OK for 2.6.22-rc8-mm2. I tried this patch on 2.6.23-mm1, and it fixed the BUGs here too. The delete path is a similar case as the one Dave fixed, also cause by a NULL vfsmount passed to dentry_open, but through a different code-path. Untested fix for this problem below: Index: linux-2.6.23-rc8/fs/reiserfs/xattr.c === --- linux-2.6.23-rc8.orig/fs/reiserfs/xattr.c 2007-09-30 14:13:46.0 +0200 +++ linux-2.6.23-rc8/fs/reiserfs/xattr.c2007-09-30 14:18:30.0 +0200 @@ -207,9 +207,8 @@ static struct dentry *get_xa_file_dentry * we're called with i_mutex held, so there are no worries about the directory * changing underneath us. */ -static int __xattr_readdir(struct file *filp, void *dirent, filldir_t filldir) +static int __xattr_readdir(struct inode *inode, void *dirent, filldir_t filldir) { - struct inode *inode = filp-f_path.dentry-d_inode; struct cpu_key pos_key; /* key of current position in the directory (key of directory entry) */ INITIALIZE_PATH(path_to_entry); struct buffer_head *bh; @@ -352,24 +351,19 @@ static int __xattr_readdir(struct file * * this is stolen from vfs_readdir * */ -static -int xattr_readdir(struct file *file, filldir_t filler, void *buf) +static int xattr_readdir(struct inode *inode, filldir_t filler, void *buf) { - struct inode *inode = file-f_path.dentry-d_inode; int res = -ENOTDIR; - if (!file-f_op || !file-f_op-readdir) - goto out; + mutex_lock_nested(inode-i_mutex, I_MUTEX_XATTR); -//down(inode-i_zombie); res = -ENOENT; if (!IS_DEADDIR(inode)) { lock_kernel(); - res = __xattr_readdir(file, buf, filler); + res = __xattr_readdir(inode, buf, filler); unlock_kernel(); } -//up(inode-i_zombie); mutex_unlock(inode-i_mutex); - out: + return res; } @@ -721,7 +715,6 @@ reiserfs_delete_xattrs_filler(void *buf, /* This is called w/ inode-i_mutex downed */ int reiserfs_delete_xattrs(struct inode *inode) { - struct file *fp; struct dentry *dir, *root; int err = 0; @@ -742,15 +735,8 @@ int reiserfs_delete_xattrs(struct inode return 0; } - fp = dentry_open(dir, NULL, O_RDWR); - if (IS_ERR(fp)) { - err = PTR_ERR(fp); - /* dentry_open dputs the dentry if it fails */ - goto out; - } - lock_kernel(); - err = xattr_readdir(fp, reiserfs_delete_xattrs_filler, dir); + err = xattr_readdir(dir-d_inode, reiserfs_delete_xattrs_filler, dir); if (err) { unlock_kernel(); goto out_dir; @@ -770,7 +756,7 @@ int reiserfs_delete_xattrs(struct inode unlock_kernel(); out_dir: - fput(fp); + dput(dir); out: if (!err) @@ -812,7 +798,6 @@ reiserfs_chown_xattrs_filler(void *buf, int reiserfs_chown_xattrs(struct inode *inode, struct iattr *attrs) { - struct file *fp; struct dentry *dir; int err = 0; struct reiserfs_chown_buf buf; @@ -836,13 +821,6 @@ int reiserfs_chown_xattrs(struct inode * goto out; } - fp = dentry_open(dir, NULL, O_RDWR); - if (IS_ERR(fp)) { - err = PTR_ERR(fp); - /* dentry_open dputs the dentry if it fails */ - goto out; - } - lock_kernel(); attrs-ia_valid = (ATTR_UID | ATTR_GID | ATTR_CTIME); @@ -850,7 +828,7 @@ int reiserfs_chown_xattrs(struct inode * buf.attrs = attrs; buf.inode = inode; - err = xattr_readdir(fp, reiserfs_chown_xattrs_filler, buf); + err = xattr_readdir(dir-d_inode, reiserfs_chown_xattrs_filler, buf); if (err) { unlock_kernel(); goto out_dir; @@ -860,7 +838,7 @@ int reiserfs_chown_xattrs(struct inode * unlock_kernel(); out_dir: - fput(fp); + dput(dir); out: attrs-ia_valid = ia_valid; @@ -1008,7 +986,6 @@ reiserfs_listxattr_filler(void *buf, con */ ssize_t reiserfs_listxattr(struct dentry * dentry, char *buffer, size_t size) { - struct file *fp; struct dentry *dir; int err = 0; struct reiserfs_listxattr_buf buf; @@ -1031,13 +1008,6 @@ ssize_t reiserfs_listxattr(struct dentry goto out; } - fp = dentry_open(dir, NULL, O_RDWR); - if (IS_ERR(fp)) { - err = PTR_ERR(fp); - /*
Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland
Hi, On 10/15/07, Erez Zadok [EMAIL PROTECTED] wrote: Pekka, with a small change to your patch (to handle time-based cache coherency), your patch worked well and passed all my tests. Thanks. So now I wonder if we still need the patch to prevent AOP_WRITEPAGE_ACTIVATE from being returned to userland. I guess we still need it, b/c even with your patch, generic_writepages() can return AOP_WRITEPAGE_ACTIVATE back to the VFS and we need to ensure that doesn't leak outside the kernel. I wonder whether _not setting_ BDI_CAP_NO_WRITEBACK implies that -writepage() will never return AOP_WRITEPAGE_ACTIVATE for !wbc-for_reclaim case which would explain why we haven't hit this bug before. Hugh, Andrew? And btw, I think we need to fix ecryptfs too. Pekka - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XFS regression?
On Mon, Oct 15, 2007 at 03:28:34PM +0530, Bhagi rathi wrote: Thanks Dave for the response. Thinking futher, why is that xfs_iunpin has to mark the inode dirty? Because the inode has been modified, and instead of sprinkling mark_inode_dirty_sync() all over the code, we can do it in a single spot that catches all inode modifications. We don't have to think about it by doing this - inodes in transactions get marked dirty for free All transactions generally modify one time or other, xfs_ichgtime takes care of marking inode as dirty. Sure, but there's plenty of other transactions that don't have such a convenient hook. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Correct behavior on O_DIRECT sparse file writes
Florian Weimer wrote: * Andrew Morton: I don't think it's a bug. Sure, O_DIRECT is synchronous, but that's because it is, err, direct. Not because it provides extra data-integrity guarantees. If you want those guarantees, use O_SYNC as well. This needs to be prominently documented. Right now, it's far from clear that you need both O_DIRECT and O_SYNC. It's certainly not a requirement for NFS. O_DIRECT on NFS forces data to the server, which always updates a file's metadata on each write, including indirect blocks. begin:vcard fn:Chuck Lever n:Lever;Chuck org:Oracle Corporation;Corporate Architecture: Linux Projects Group adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA title:Principal Member of Staff tel;work:+1 248 614 5091 x-mozilla-html:FALSE version:2.1 end:vcard
Re: Correct behavior on O_DIRECT sparse file writes
The test below creates a sparse file and then fills a hole with O_DIRECT. As far as I can tell from reading generic_osync_inode, the filesystem metadata is only forced to disk if i_size changes during the file write. I've tested ext3, xfs and reiserfs and they all skip the commit when filling holes. I would argue that filling holes via O_DIRECT is supposed to commit the metadata required to find those file blocks later. At least on ext3, O_SYNC does force a commit on fill holes (haven't tested others). I don't think it's a bug. Sure, O_DIRECT is synchronous, but that's because it is, err, direct. Not because it provides extra data-integrity guarantees. If you want those guarantees, use O_SYNC as well. That makes sense, but how do you explain the committing of the size change without O_SYNC? That seems wrong to me. This does need to be documented carefully, because a person could easily believe, even subconsciously, that O_DIRECT makes the entire file write direct, and sloppy documentation might actually use words to that effect. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-mm1: BUG in reiserfs_delete_xattrs
Christoph Hellwig wrote: On Mon, Oct 15, 2007 at 12:34:58AM +0200, Laurent Riffard wrote: reiserfs_delete_xattrs reiserfs_delete_inode generic_delete_inode generic_drop_inode iput do_unlinkat sys_unlink sys_enter_past_esp I reported a similar BUG in 2.6.22-rc8-mm2 (see http://lkml.org/lkml/2007/9/27/235). Dave Hansen sent a patch for it, I tested it and it was OK for 2.6.22-rc8-mm2. I tried this patch on 2.6.23-mm1, and it fixed the BUGs here too. The delete path is a similar case as the one Dave fixed, also cause by a NULL vfsmount passed to dentry_open, but through a different code-path. Untested fix for this problem below: Here's a patch I worked up the other night that kills off struct file completely from the xattr code. I've tested it locally. After several posts and bug reports regarding interaction with the NULL nameidata, here's a patch to clean up the mess with struct file in the reiserfs xattr code. As observed in several of the posts, there's really no need for struct file to exist in the xattr code. It was really only passed around due to the f_op-readdir() and a_ops-{prepare,commit}_write prototypes requiring it. reiserfs_prepare_write() and reiserfs_commit_write() don't actually use the struct file passed to it, and the xattr code uses a private version of reiserfs_readdir() to enumerate the xattr directories. I do have patches in my queue to convert the xattrs to use reiserfs_readdir(), but I guess I'll just have to rework those. This is pretty close to the patch by Dave Hansen for -mm, but I didn't notice it until after I wrote this up. Signed-off-by: Jeff Mahoney [EMAIL PROTECTED] --- fs/reiserfs/xattr.c | 111 ++-- 1 file changed, 31 insertions(+), 80 deletions(-) --- a/fs/reiserfs/xattr.c 2007-08-27 14:03:39.0 -0400 +++ b/fs/reiserfs/xattr.c 2007-10-14 22:11:05.0 -0400 @@ -191,28 +191,11 @@ static struct dentry *get_xa_file_dentry dput(xadir); if (err) xafile = ERR_PTR(err); - return xafile; -} - -/* Opens a file pointer to the attribute associated with inode */ -static struct file *open_xa_file(const struct inode *inode, const char *name, -int flags) -{ - struct dentry *xafile; - struct file *fp; - - xafile = get_xa_file_dentry(inode, name, flags); - if (IS_ERR(xafile)) - return ERR_PTR(PTR_ERR(xafile)); else if (!xafile-d_inode) { dput(xafile); - return ERR_PTR(-ENODATA); + xafile = ERR_PTR(-ENODATA); } - - fp = dentry_open(xafile, NULL, O_RDWR); - /* dentry_open dputs the dentry if it fails */ - - return fp; + return xafile; } /* @@ -228,9 +211,8 @@ static struct file *open_xa_file(const s * we're called with i_mutex held, so there are no worries about the directory * changing underneath us. */ -static int __xattr_readdir(struct file *filp, void *dirent, filldir_t filldir) +static int __xattr_readdir(struct inode *inode, void *dirent, filldir_t filldir) { - struct inode *inode = filp-f_path.dentry-d_inode; struct cpu_key pos_key; /* key of current position in the directory (key of directory entry) */ INITIALIZE_PATH(path_to_entry); struct buffer_head *bh; @@ -374,23 +356,16 @@ static int __xattr_readdir(struct file * * */ static -int xattr_readdir(struct file *file, filldir_t filler, void *buf) +int xattr_readdir(struct inode *inode, filldir_t filler, void *buf) { - struct inode *inode = file-f_path.dentry-d_inode; - int res = -ENOTDIR; - if (!file-f_op || !file-f_op-readdir) - goto out; + int res = -ENOENT; mutex_lock_nested(inode-i_mutex, I_MUTEX_XATTR); -//down(inode-i_zombie); - res = -ENOENT; if (!IS_DEADDIR(inode)) { lock_kernel(); - res = __xattr_readdir(file, buf, filler); + res = __xattr_readdir(inode, buf, filler); unlock_kernel(); } -//up(inode-i_zombie); mutex_unlock(inode-i_mutex); - out: return res; } @@ -436,7 +411,7 @@ reiserfs_xattr_set(struct inode *inode, size_t buffer_size, int flags) { int err = 0; - struct file *fp; + struct dentry *dentry; struct page *page; char *data; struct address_space *mapping; @@ -454,18 +429,18 @@ reiserfs_xattr_set(struct inode *inode, xahash = xattr_hash(buffer, buffer_size); open_file: - fp = open_xa_file(inode, name, flags); - if (IS_ERR(fp)) { - err = PTR_ERR(fp); + dentry = get_xa_file_dentry(inode, name, flags); + if (IS_ERR(dentry)) { + err = PTR_ERR(dentry); goto out; } - xinode = fp-f_path.dentry-d_inode; + xinode =
Re: 2.6.23-mm1: BUG in reiserfs_delete_xattrs
Le 15.10.2007 10:40, Christoph Hellwig a écrit : On Mon, Oct 15, 2007 at 12:34:58AM +0200, Laurent Riffard wrote: reiserfs_delete_xattrs reiserfs_delete_inode generic_delete_inode generic_drop_inode iput do_unlinkat sys_unlink sys_enter_past_esp I reported a similar BUG in 2.6.22-rc8-mm2 (see http://lkml.org/lkml/2007/9/27/235). Dave Hansen sent a patch for it, I tested it and it was OK for 2.6.22-rc8-mm2. I tried this patch on 2.6.23-mm1, and it fixed the BUGs here too. The delete path is a similar case as the one Dave fixed, also cause by a NULL vfsmount passed to dentry_open, but through a different code-path. Untested fix for this problem below: Does work fine, thanks. Tested-by: Laurent Riffard [EMAIL PROTECTED] Index: linux-2.6.23-rc8/fs/reiserfs/xattr.c === --- linux-2.6.23-rc8.orig/fs/reiserfs/xattr.c 2007-09-30 14:13:46.0 +0200 +++ linux-2.6.23-rc8/fs/reiserfs/xattr.c 2007-09-30 14:18:30.0 +0200 @@ -207,9 +207,8 @@ static struct dentry *get_xa_file_dentry * we're called with i_mutex held, so there are no worries about the directory * changing underneath us. */ -static int __xattr_readdir(struct file *filp, void *dirent, filldir_t filldir) +static int __xattr_readdir(struct inode *inode, void *dirent, filldir_t filldir) { - struct inode *inode = filp-f_path.dentry-d_inode; struct cpu_key pos_key; /* key of current position in the directory (key of directory entry) */ INITIALIZE_PATH(path_to_entry); struct buffer_head *bh; @@ -352,24 +351,19 @@ static int __xattr_readdir(struct file * * this is stolen from vfs_readdir * */ -static -int xattr_readdir(struct file *file, filldir_t filler, void *buf) +static int xattr_readdir(struct inode *inode, filldir_t filler, void *buf) { - struct inode *inode = file-f_path.dentry-d_inode; int res = -ENOTDIR; - if (!file-f_op || !file-f_op-readdir) - goto out; + mutex_lock_nested(inode-i_mutex, I_MUTEX_XATTR); -//down(inode-i_zombie); res = -ENOENT; if (!IS_DEADDIR(inode)) { lock_kernel(); - res = __xattr_readdir(file, buf, filler); + res = __xattr_readdir(inode, buf, filler); unlock_kernel(); } -//up(inode-i_zombie); mutex_unlock(inode-i_mutex); - out: + return res; } @@ -721,7 +715,6 @@ reiserfs_delete_xattrs_filler(void *buf, /* This is called w/ inode-i_mutex downed */ int reiserfs_delete_xattrs(struct inode *inode) { - struct file *fp; struct dentry *dir, *root; int err = 0; @@ -742,15 +735,8 @@ int reiserfs_delete_xattrs(struct inode return 0; } - fp = dentry_open(dir, NULL, O_RDWR); - if (IS_ERR(fp)) { - err = PTR_ERR(fp); - /* dentry_open dputs the dentry if it fails */ - goto out; - } - lock_kernel(); - err = xattr_readdir(fp, reiserfs_delete_xattrs_filler, dir); + err = xattr_readdir(dir-d_inode, reiserfs_delete_xattrs_filler, dir); if (err) { unlock_kernel(); goto out_dir; @@ -770,7 +756,7 @@ int reiserfs_delete_xattrs(struct inode unlock_kernel(); out_dir: - fput(fp); + dput(dir); out: if (!err) @@ -812,7 +798,6 @@ reiserfs_chown_xattrs_filler(void *buf, int reiserfs_chown_xattrs(struct inode *inode, struct iattr *attrs) { - struct file *fp; struct dentry *dir; int err = 0; struct reiserfs_chown_buf buf; @@ -836,13 +821,6 @@ int reiserfs_chown_xattrs(struct inode * goto out; } - fp = dentry_open(dir, NULL, O_RDWR); - if (IS_ERR(fp)) { - err = PTR_ERR(fp); - /* dentry_open dputs the dentry if it fails */ - goto out; - } - lock_kernel(); attrs-ia_valid = (ATTR_UID | ATTR_GID | ATTR_CTIME); @@ -850,7 +828,7 @@ int reiserfs_chown_xattrs(struct inode * buf.attrs = attrs; buf.inode = inode; - err = xattr_readdir(fp, reiserfs_chown_xattrs_filler, buf); + err = xattr_readdir(dir-d_inode, reiserfs_chown_xattrs_filler, buf); if (err) { unlock_kernel(); goto out_dir; @@ -860,7 +838,7 @@ int reiserfs_chown_xattrs(struct inode * unlock_kernel(); out_dir: - fput(fp); + dput(dir); out: attrs-ia_valid = ia_valid; @@ -1008,7 +986,6 @@ reiserfs_listxattr_filler(void *buf, con */ ssize_t reiserfs_listxattr(struct dentry * dentry, char *buffer, size_t size) { - struct file *fp; struct dentry *dir; int err = 0; struct reiserfs_listxattr_buf buf; @@ -1031,13 +1008,6 @@ ssize_t reiserfs_listxattr(struct dentry goto out; } - fp
Re: 2.6.23-mm1: BUG in reiserfs_delete_xattrs
Le 15.10.2007 20:31, Jeff Mahoney a écrit : Christoph Hellwig wrote: On Mon, Oct 15, 2007 at 12:34:58AM +0200, Laurent Riffard wrote: reiserfs_delete_xattrs reiserfs_delete_inode generic_delete_inode generic_drop_inode iput do_unlinkat sys_unlink sys_enter_past_esp I reported a similar BUG in 2.6.22-rc8-mm2 (see http://lkml.org/lkml/2007/9/27/235). Dave Hansen sent a patch for it, I tested it and it was OK for 2.6.22-rc8-mm2. I tried this patch on 2.6.23-mm1, and it fixed the BUGs here too. The delete path is a similar case as the one Dave fixed, also cause by a NULL vfsmount passed to dentry_open, but through a different code-path. Untested fix for this problem below: Here's a patch I worked up the other night that kills off struct file completely from the xattr code. I've tested it locally. Sorry Jeff, your patch does not apply on 2.6.23-mm1. The 'struct file' removal from reiserfs_xattr_ function is already in -mm (make-reiserfs-stop-using-struct-file-for-internal.patch). The Dave's patch I was refering to is this one: BEGIN = The bug is caused by reiserfs creating a special 'struct file' with a NULL vfsmount. /* Opens a file pointer to the attribute associated with inode */ static struct file *open_xa_file(const struct inode *inode, const char *name, int flags) { ... fp = dentry_open(xafile, NULL, O_RDWR); /* dentry_open dputs the dentry if it fails */ As Christoph just said, this is somewhat of a bandaid. But, it shouldn't hurt anything. --- lxc-dave/fs/file_table.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN fs/open.c~fix-reiserfs-oops fs/open.c diff -puN fs/file_table.c~fix-reiserfs-oops fs/file_table.c --- lxc/fs/file_table.c~fix-reiserfs-oops 2007-09-27 13:32:20.0 -0700 +++ lxc-dave/fs/file_table.c2007-09-27 13:33:11.0 -0700 @@ -236,7 +236,7 @@ void fastcall __fput(struct file *file) fops_put(file-f_op); if (file-f_mode FMODE_WRITE) { put_write_access(inode); - if (!special_file(inode-i_mode)) + if (!special_file(inode-i_mode) mnt) mnt_drop_write(mnt); } put_pid(file-f_owner.pid); diff -puN include/linux/mount.h~fix-reiserfs-oops include/linux/mount.h END Dave sent it privately to me... I guess this bandaid is no longer needed now, is it? ~~ laurent - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-mm1: BUG in reiserfs_delete_xattrs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Laurent Riffard wrote: Le 15.10.2007 20:31, Jeff Mahoney a écrit : Christoph Hellwig wrote: On Mon, Oct 15, 2007 at 12:34:58AM +0200, Laurent Riffard wrote: reiserfs_delete_xattrs reiserfs_delete_inode generic_delete_inode generic_drop_inode iput do_unlinkat sys_unlink sys_enter_past_esp I reported a similar BUG in 2.6.22-rc8-mm2 (see http://lkml.org/lkml/2007/9/27/235). Dave Hansen sent a patch for it, I tested it and it was OK for 2.6.22-rc8-mm2. I tried this patch on 2.6.23-mm1, and it fixed the BUGs here too. The delete path is a similar case as the one Dave fixed, also cause by a NULL vfsmount passed to dentry_open, but through a different code-path. Untested fix for this problem below: Here's a patch I worked up the other night that kills off struct file completely from the xattr code. I've tested it locally. Sorry Jeff, your patch does not apply on 2.6.23-mm1. The 'struct file' removal from reiserfs_xattr_ function is already in -mm (make-reiserfs-stop-using-struct-file-for-internal.patch). The Dave's patch I was refering to is this one: I'd guess not. This patch was actually against mainline. I should've specified. I can work up one against -mm later today if it's needed. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.4-svn0 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHE8wyLPWxlyuTD7IRAiJrAJ4nC6gwH1cFjWx6BI04O5fDIRftmACcD2wb whyXThHlIBK2phnZ6Pf8Pb8= =Kx6k -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
More Large blocksize benchmarks
Hello everyone, I'm stealing the cc list and reviving and old thread because I've finally got some numbers to go along with the Btrfs variable blocksize feature. The basic idea is to create a read/write interface to map a range of bytes on the address space, and use it in Btrfs for all metadata operations (file operations have always been extent based). So, instead of casting buffer_head-b_data to some structure, I read and write at offsets in a struct extent_buffer. The extent buffer is very small and backed by an address space, and I get large block sizes the same way file_write gets to write to 16k at a time, by finding the appropriate page in the addess space. This is an over simplification since I try to cache these mapping decisions to avoid using too much CPU, but hopefully you get the idea. The advantage to this approach is the changes are all inside Btrfs. No extra kernel patches were required. Dave reported that XFS saw much higher write throughput with large blocksizes, but so far I'm seeing the most benefits during reads. The next step is a bunch more benchmarks. I've done the first round and posted it here: http://oss.oracle.com/~mason/blocksizes/ The Btrfs code makes it relatively easy to experiment, and so this may be a good step toward figuring out if some automagic solution is worth it in general. I can even use different sizes for nodes and leaves, although I haven't done much testing at all there yet. -chris - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] docbook: fix filesystems content
From: Randy Dunlap [EMAIL PROTECTED] Fix filesystems docbook warnings. Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'name' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'mode' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'parent' Warning(linux-2.6.23-git8//fs/debugfs/file.c:241): No description found for parameter 'value' Warning(linux-2.6.23-git8//include/linux/jbd.h:404): No description found for parameter 'h_lockdep_map' Signed-off-by: Randy Dunlap [EMAIL PROTECTED] --- fs/debugfs/file.c | 41 +++-- include/linux/jbd.h |1 + 2 files changed, 36 insertions(+), 6 deletions(-) --- linux-2.6.23-git8.orig/fs/debugfs/file.c +++ linux-2.6.23-git8/fs/debugfs/file.c @@ -227,15 +227,24 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_x16, debugf DEFINE_SIMPLE_ATTRIBUTE(fops_x32, debugfs_u32_get, debugfs_u32_set, 0x%08llx\n); -/** - * debugfs_create_x8 - create a debugfs file that is used to read and write an unsigned 8-bit value - * debugfs_create_x16 - create a debugfs file that is used to read and write an unsigned 16-bit value - * debugfs_create_x32 - create a debugfs file that is used to read and write an unsigned 32-bit value +/* + * debugfs_create_x{8,16,32} - create a debugfs file that is used to read and write an unsigned {8,16,32}-bit value * - * These functions are exactly the same as the above functions, (but use a hex - * output for the decimal challenged) for details look at the above unsigned + * These functions are exactly the same as the above functions (but use a hex + * output for the decimal challenged). For details look at the above unsigned * decimal functions. */ + +/** + * debugfs_create_x8 - create a debugfs file that is used to read and write an unsigned 8-bit value + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is %NULL, then the + * file will be created in the root of the debugfs filesystem. + * @value: a pointer to the variable that the file should read to and write + * from. + */ struct dentry *debugfs_create_x8(const char *name, mode_t mode, struct dentry *parent, u8 *value) { @@ -243,6 +252,16 @@ struct dentry *debugfs_create_x8(const c } EXPORT_SYMBOL_GPL(debugfs_create_x8); +/** + * debugfs_create_x16 - create a debugfs file that is used to read and write an unsigned 16-bit value + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is %NULL, then the + * file will be created in the root of the debugfs filesystem. + * @value: a pointer to the variable that the file should read to and write + * from. + */ struct dentry *debugfs_create_x16(const char *name, mode_t mode, struct dentry *parent, u16 *value) { @@ -250,6 +269,16 @@ struct dentry *debugfs_create_x16(const } EXPORT_SYMBOL_GPL(debugfs_create_x16); +/** + * debugfs_create_x32 - create a debugfs file that is used to read and write an unsigned 32-bit value + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is %NULL, then the + * file will be created in the root of the debugfs filesystem. + * @value: a pointer to the variable that the file should read to and write + * from. + */ struct dentry *debugfs_create_x32(const char *name, mode_t mode, struct dentry *parent, u32 *value) { --- linux-2.6.23-git8.orig/include/linux/jbd.h +++ linux-2.6.23-git8/include/linux/jbd.h @@ -372,6 +372,7 @@ struct jbd_revoke_table_s; * @h_sync: flag for sync-on-close * @h_jdata: flag to force data journaling * @h_aborted: flag indicating fatal error on handle + * @h_lockdep_map: lockdep info for debugging lock problems **/ /* Docbook can't yet cope with the bit fields, but will leave the documentation - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: More Large blocksize benchmarks
On Mon, 15 Oct 2007, Chris Mason wrote: Dave reported that XFS saw much higher write throughput with large blocksizes, but so far I'm seeing the most benefits during reads. Dave's tests were done with an early large blocksize patchset that had issues with readahead. More recent versions have the fixes by Fengguang that address the issue. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: More Large blocksize benchmarks
On Mon, Oct 15, 2007 at 08:22:31PM -0400, Chris Mason wrote: Hello everyone, I'm stealing the cc list and reviving and old thread because I've finally got some numbers to go along with the Btrfs variable blocksize feature. The basic idea is to create a read/write interface to map a range of bytes on the address space, and use it in Btrfs for all metadata operations (file operations have always been extent based). So, instead of casting buffer_head-b_data to some structure, I read and write at offsets in a struct extent_buffer. The extent buffer is very small and backed by an address space, and I get large block sizes the same way file_write gets to write to 16k at a time, by finding the appropriate page in the addess space. This is an over simplification since I try to cache these mapping decisions to avoid using too much CPU, but hopefully you get the idea. The advantage to this approach is the changes are all inside Btrfs. No extra kernel patches were required. Dave reported that XFS saw much higher write throughput with large blocksizes, but so far I'm seeing the most benefits during reads. Apples to oranges, Chris ;) btrfs linearises writes due to it's COW behaviour and this is trades off read speed. i.e. we take more seeks to read data so we can keep the write speed high. By using large blocks, you're reducing the number of seeks needed to find anything, and hence the read speed will increase. Write speed will be pretty much unchanged because btrfs does linear writes no matter the block size. XFS doesn't linearise writes and optimises it's layout for a large number of disks and a low number of seeks on reads - the opposite of btrfs. Hence large block sizes reduce the number of writes XFS needs to write a given set of data+metadata and hence write speed increases much more than the read speed (until you get to large tree traversals). The basic conclusion is that different filesystems will benefit in different ways with large block sizes Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html