[PATCH] Coda: fix for coda_rmdir

2000-11-09 Thread Miklos Szeredi
After rmdir, the inode of the directory isn't cleared. This is because the i_nlink filed of an empty directory is 2, and rmdir decreases this by one leaving i_nlink = 1, which is incorrect. The following patch fixes this, and also removes the superfluous d_delete(), which is also called in

Re: [PATCH] Coda: fix for coda_rmdir

2000-11-10 Thread Miklos Szeredi
After rmdir, the inode of the directory isn't cleared. This is because the i_nlink filed of an empty directory is 2, and rmdir decreases this by one leaving i_nlink = 1, which is incorrect. The following patch fixes this, and also removes the superfluous d_delete(), which is also called

[RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Miklos Szeredi
We're having a bit of a disagreement with Christoph Hellwig about the way FUSE does (or should do) permission checking. Comments (either way) are appreciated. Here's my side of the story: FUSE (filesystem in userspace) is designed to allow mounting an FS by non-privileged users (it can also be

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Miklos Szeredi
3) No other user should have access to files under the mount, not even root[5] [5] Obviously root cannot be restricted, but accidental access to private data is still a good idea. E.g. root squashing by NFS servers has a similar affect. Could you explain a little more? I

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-11 Thread Miklos Szeredi
Root squashing is actually a much less obnoxious restriction. It means that local uid 0 doesn't automatically correspond to remote uid 0. I don't agree that it's less obnoxious. Root squashing and a restricted directory (-rwx--) would have exactly the same affect: root is

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-12 Thread Miklos Szeredi
I think that would be _much_ nicer implemented as a mount which is invisible to other users, rather than one which causes the admin's scripts to spew error messages. Spew is a strong word. It'll get a single EACCES at the mountpoint. The same is true for directories not accessible by 'nobody'

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-12 Thread Miklos Szeredi
If the user wants to edit a read-only file in a tgz owned by himself, why can he not _chmod_ the file and _then_ edit it? That said, I would _usually_ prefer that when I enter a tgz, that I see all component files having the same uid/gid/permissions as the tgz file itself - the same as I'd

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-12 Thread Miklos Szeredi
With that, the desire for virtual filesystems which cannot be read by your sysadmin (by accident) is easy to satisfy - and that kind of mechanism would probably be acceptable to all. The problem is that this way the responsibility goes to the userspace program, which can't be

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-12 Thread Miklos Szeredi
For 1) your porposal makes sense, however for 2) it's useless, since now the user doesn't want the hiding. I realize that the idea _could_ be used to drop 'allow_root' mount option from the kernel. Since 'allow_root' doesn't add any security over 'allow_other' it's safe to do it in userspace.

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-12 Thread Miklos Szeredi
Note that NFS checks the permissions on _both_ the client and server, for a reason. Does it? If I read the code correctly the client checks credentials supplied by the server (or cached). But the server does the actual checking of permissions. Am I missing something? Yes,

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-12 Thread Miklos Szeredi
And for either version of NFS, if the uid and gid are non-zero, and the permission bits indicate that an access is permitted, then the client does not consult the server for permission. Where's that? I see no such check. /* * Trust UNIX mode bits except:

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-12 Thread Miklos Szeredi
There was a thread a few months ago where file-as-directory was discussed extensively, after Namesys implemented it. That's where the conversation on detachable mount points originated AFAIR. It will probably happen at some point. A nice implemention of it in FUSE could push it along a

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-12 Thread Miklos Szeredi
Still can't find it :) Which kernel? Which file? I'm looking at linux-2.4.30/fs/nfs/dir.c. Ahh, OK. nfs_permission() in 2.6 looks quite a bit different. And permission bits are not used if -access() is available. Miklos - To unsubscribe from this list: send the line unsubscribe

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-13 Thread Miklos Szeredi
Aren't there some assumptions in VFS that currently make this impossible? I believe it's OK with VFS, but applications would be confused to death. Well, there really is one issue -- dentries have exactly one parent, so what do you do when opening a file with hardlinks as a directory? (In

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-13 Thread Miklos Szeredi
Look up the rather large linux-kernel linux-fsdevel thread silent semantic changes with reiser4 and it's followup threads, from last year. Wow, it's 700+ messages. I got through the first 40, and already feel dizzy :) It's already been tried. You will also find sensible ideas on what

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-13 Thread Miklos Szeredi
Yet, the results from stat() don't distinguish the number spaces, and ls doesn't map the numbers to names properly in the wrong space. Well you can use ls -n. It's up to the tools to present the information you want in the way you want it. If a tool can't do that, tough, but

Re: [RFC] FUSE permission modell (Was: fuse review bits)

2005-04-17 Thread Miklos Szeredi
1) Only allow mount over a directory for which the user has write access (and is not sticky) 2) Use nosuid,nodev mount options [ parts deleted ] Do these solve all the security concerns with unprivileged mounts, or are there other barriers/concerns? Should there be

Re: [RFC][2.6 patch] Allow creation of new namespaces during mount system call

2005-04-20 Thread Miklos Szeredi
Reading through the thread I assume the requirement is: 1) A User being able to create his own VFS-mount environment 2) being able to use the same VFS-mount environment from multiple login sessions. 3) Being able to switch some processes to some other VFS-mount

Re: [RFC][2.6 patch] Allow creation of new namespaces during mount system call

2005-04-20 Thread Miklos Szeredi
(Please don't post separately to different recipients, that makes replying quite awkward. Always reply to all, it's the Right Thing) I disagree with this, I think there are plenty of situations where I may want to have several different namespaces for several different sessions. Once you

Re: [RFC][2.6 patch] Allow creation of new namespaces during mount system call

2005-04-20 Thread Miklos Szeredi
For the issues being discussed here, I don't think that's materially different from what we started with; it has the same issue concerning whether a user should be allowed to change his namespace and whether a process' namespace should change automatically when another process does

Re: [RFC][2.6 patch] Allow creation of new namespaces during mount system call

2005-04-21 Thread Miklos Szeredi
OK, I overlooked the problem of having to add commands to the shell and everything else. While there's plenty of precedent for this style (current directory, ulimits, umask), I wouldn't like to extend it, even to adding a command to Bash. But it could follow the 'nice' and 'renice'

Re: [RFC PATCH 1/8] share/private/slave a subtree

2005-07-08 Thread Miklos Szeredi
This patch adds the shared/private/slave support for VFS trees. [...] -struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry) +struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry, struct dentry *root) { How about changing it to inline and calling it

Re: [RFC PATCH 1/8] share/private/slave a subtree

2005-07-08 Thread Miklos Szeredi
+ * recursively change the type of the mountpoint. + */ +static int do_change_type(struct nameidata *nd, int flag) +{ + struct vfsmount *m, *mnt; + struct vfspnode *old_pnode = NULL; + int err; + + if (!(flag MS_SHARED) !(flag MS_PRIVATE) + !(flag

Re: [RFC PATCH 1/8] share/private/slave a subtree

2005-07-08 Thread Miklos Szeredi
The reason why I implemented that way, is to less confuse the user and provide more flexibility. With my implementation, we have the ability to share any part of the tree, without bothering if it is a mountpoint or not. The side effect of this operation is, it ends up creating a vfsmount if

Re: [PATCH 3/7] shared subtree

2005-07-27 Thread Miklos Szeredi
@@ -54,7 +55,7 @@ static inline unsigned long hash(struct struct vfsmount *alloc_vfsmnt(const char *name) { - struct vfsmount *mnt = kmem_cache_alloc(mnt_cache, GFP_KERNEL); + struct vfsmount *mnt = kmem_cache_alloc(mnt_cache, GFP_KERNEL); if (mnt) {

Re: [PATCH 1/7] shared subtree

2005-07-27 Thread Miklos Szeredi
+static int do_make_shared(struct vfsmount *mnt) +{ + int err=0; + struct vfspnode *old_pnode = NULL; + /* + * if the mount is already a slave mount, + * allocate a new pnode and make it + * a slave pnode of the original pnode. + */ + if

Re: [PATCH 3/7] shared subtree

2005-07-28 Thread Miklos Szeredi
yes we agreed on returning EINVAL when a directory is attempted to made shared/private/slave/unclonnable. But this is a different case. lets say /mnt is a mountpoint having a vfsmount (say A). now if you run mount --bind /mnt/a /tmp this operation succeeds currently. now

Re: mount behavior question.

2005-07-28 Thread Miklos Szeredi
Here is a scenario with shared subtree. Sorry it is complex. mount --bind /mnt /mnt mount --make-shared /mnt mkdir -p /mnt/p mount --bind /usr /mnt/1 mount --bind /mnt /mnt/2 At this stage the mount at /mnt/2 and /mnt belong to the same pnode which means mounts under them propogate

Re: mount behavior question.

2005-07-28 Thread Miklos Szeredi
step 1: mount --bind /mnt /mnt a new mount 'A' is created at /mnt step 2: mount --make-shared /mnt mounts under 'A' are made shared. But in this case there are no other mounts. So only 'A' will be made shared. step 3: mkdir -p /mnt/1

Re: mount behavior question.

2005-07-28 Thread Miklos Szeredi
I think the issue is what does mount F over directory D mean? Does it mean to mount F immediately over D, in spite of anything that might be stacked above D right now? Or does it mean to throw F onto the stack which is currently sitting over D? Your analysis assumes it's the former,

Re: mount behavior question.

2005-07-28 Thread Miklos Szeredi
I am not surprised when mounts on /mnt/1 do not propogate to /mnt/2/1 This is expected, and I am perfectly happy. Because the mount is attempted on 'B' and 'B' has nobody to propogate to. when mount on /mnt/2/1 (i.e on C at dentry 1) is attempted, I expect to see a new mount 'E' at that

Re: mount behavior question.

2005-07-28 Thread Miklos Szeredi
Does it mean to mount F immediately over D, in spite of anything that might be stacked above D right now? Or does it mean to throw F onto the stack which is currently sitting over D? Your analysis assumes it's the former, whereas what Linux does is consistent with the latter.

Re: [PATCH 1/7] shared subtree

2005-07-29 Thread Miklos Szeredi
static struct vfsmount *propagation_next(struct vfsmount *p, struct vfsmount *base) { /* first iterate over the slaves */ if (!list_empty(p-mnt_slave_list)) return first_slave(p); I think this code should be if

Re: [PATCH 1/7] shared subtree

2005-07-31 Thread Miklos Szeredi
Ok. I have started implementing your idea. But the implementation is no simple. Its becomes a complex mess. Atleast in the case of pnode datastructure implementation, the propogation was all abstracted and concentrated in the pnode datastructure. Here is a sample implementation of

Re: readdir behaviour

2005-08-02 Thread Miklos Szeredi
First of all I would like to know what exactly is the meaning of the 'offset' parameter of filldir and whether it is used somewhere? The user visible use of offset, is when you do a telldir(), store the returned offset, and later do a seekdir(). Also you can directly use dentry-d_off as an

Re: Inode EIO flag?

2005-08-03 Thread Miklos Szeredi
Is there any inode flag (or anything equivalent) indicating that writing that particular inode to the device failed because of an IO error? For the data, there's AS_EIO flag in inode-i_mapping-flag. Miklos - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of

[no subject]

2005-08-07 Thread Miklos Szeredi
Christoph Hellwig wrote: I'd rather forbid binds to the foreign namespace, though. Bind is a directional operation. TO a foreign namespace is already forbidden, FROM a foreign namespace it's not. Is that logical? Not too much, I agree. Which is better? a) removing restrictions from bind

[RFC] atomic open(..., O_CREAT | ...)

2005-08-08 Thread Miklos Szeredi
I'd like to make my filesystem be able to do file creation and opening atomically. This is needed for filesystems which cannot separate checking open permission from the actual open operation. Usually any filesystem served from userspace by an unprivileged (no CAP_DAC_OVERRIDE) process will be

Re: [RFC] atomic open(..., O_CREAT | ...)

2005-08-09 Thread Miklos Szeredi
We've already got a patch that does this, and that I'm queueing up for inclusion. Cool! http://client.linux-nfs.org/Linux-2.6.x/2.6.12/linux-2.6.12-63-open_file_intents.dif Comments: /* + * Open intents have to release any file pointer that was allocated + * but not used by the VFS. +

Re: [RFC] atomic open(..., O_CREAT | ...)

2005-08-09 Thread Miklos Szeredi
Intents are meant as optimisations, not replacements for existing operations. I'm therefore not really comfortable about having them return errors at all. In my case they are not an optimization, rather the only way to correctly perform an open with O_CREAT. + nd-intent.open.file =

Re: [RFC] atomic open(..., O_CREAT | ...)

2005-08-09 Thread Miklos Szeredi
+ nd-intent.open.file = NULL; Why is this NULL assignment needed? nd will not be used after this. + } + path_release(nd); +} + It could be dropped. The reason for putting it in is that some parts of the VFS may restart a

Re: [RFC] atomic open(..., O_CREAT | ...)

2005-08-09 Thread Miklos Szeredi
There is quite a bit of code out there that assumes it is free to stuff things into nd-mnt and nd-dentry. Some of it is Al Viro's code, some of it is from other people. For instance, the ESTALE handling will just save nd-mnt/nd-dentry before calling __link_path_walk(), then restore

Re: [RFC] atomic open(..., O_CREAT | ...)

2005-08-09 Thread Miklos Szeredi
Really? static int __emul_lookup_dentry(const char *name, struct nameidata *nd) { . if (path_walk(name, nd) == 0) { if (nd-dentry-d_inode) { dput(old_dentry);

Re: IS_NOCMTIME and setting of ctime and mtime on remote servers

2005-08-29 Thread Miklos Szeredi
NFS is the only place that sets NOCMTIME on inodes in its fhget routine IIRC. FUSE too. What is the exact intent of this? Does it stay set (so mtime and ctime updates are never sent to the server) or does it get reset somewhere (I did not see where nfs turned it off so presumably even

Re: [EMAIL PROTECTED]: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]

2005-08-29 Thread Miklos Szeredi
Fair enough, where in /sys should such things go? /proc/fs is a well-known place, but there is no /sys/fs :-) It's pretty easy to create. I had a patch: http://marc.theaimsgroup.com/?l=linux-fsdevelm=110099238515110w=2 to which Greg had a comment:

Re: [PATCH] ia_attr_flags - time to die

2005-09-02 Thread Miklos Szeredi
Already dead ;) 2.6.13-mm1: remove-ia_attr_flags.patch Miklos - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: FUSE merging?

2005-09-03 Thread Miklos Szeredi
While FUSE doesn't handle it directly, doesn't it have to punt it to its network file systems, how to the sshfs and what not handle this sort of mapping? Sshfs handles it by not handling it. In this case it is neither possible, nor needed to be able to correctly map the id space. Yes, it may

Re: FUSE merging?

2005-09-03 Thread Miklos Szeredi
Yes, it may confuse the user. It may even confuse the kernel for sticky directories(*). But basically it just works, and is very simple. In principal, Plan 9 file servers handle permission checking server-side, so we could likewise punt -- but it seemed a good idea to have some

Re: Finding hardlinks

2006-12-20 Thread Miklos Szeredi
I've came across this problem: how can a userspace program (such as for example cp -a) tell that two files form a hardlink? Comparing inode number will break on filesystems that can have more than 2^32 files (NFS3, OCFS, SpadFS; kernel developers already implemented iget5_locked for the

Re: Finding hardlinks

2006-12-20 Thread Miklos Szeredi
I've came across this problem: how can a userspace program (such as for example cp -a) tell that two files form a hardlink? Comparing inode number will break on filesystems that can have more than 2^32 files (NFS3, OCFS, SpadFS; kernel developers already implemented iget5_locked for the

Re: Finding hardlinks

2006-12-28 Thread Miklos Szeredi
It seems like the posix idea of unique st_dev, st_ino doesn't hold water for modern file systems are you really sure? Well Jan's example was of Coda that uses 128-bit internal file ids. and if so, why don't we fix *THAT* instead Hmm, sometimes you can't fix the world,

Re: Finding hardlinks

2007-01-02 Thread Miklos Szeredi
It seems like the posix idea of unique st_dev, st_ino doesn't hold water for modern file systems are you really sure? Well Jan's example was of Coda that uses 128-bit internal file ids. and if so, why don't we fix *THAT* instead Hmm, sometimes you can't fix

Re: Finding hardlinks

2007-01-03 Thread Miklos Szeredi
the use of a good hash function. The chance of an accidental collision is infinitesimally small. For a set of 100 files: 0.03% 1,000,000 files: 0.03% I do not think we want to play with probability like this. I mean... imagine 4G files,

Re: Finding hardlinks

2007-01-05 Thread Miklos Szeredi
High probability is all you have. Cosmic radiation hitting your computer will more likly cause problems, than colliding 64bit inode numbers ;) Some of us have machines designed to cope with cosmic rays, and would be unimpressed with a decrease in reliability. With the

Re: Finding hardlinks

2007-01-05 Thread Miklos Szeredi
And does it matter? If you rename a file, tar might skip it no matter of hardlink detection (if readdir races with rename, you can read none of the names of file, one or both --- all these are possible). If you have dir1/a hardlinked to dir1/b and while tar runs you delete both a and b

Re: Finding hardlinks

2007-01-05 Thread Miklos Szeredi
And does it matter? If you rename a file, tar might skip it no matter of hardlink detection (if readdir races with rename, you can read none of the names of file, one or both --- all these are possible). If you have dir1/a hardlinked to dir1/b and while tar runs you delete both a

Re: Finding hardlinks

2007-01-08 Thread Miklos Szeredi
No one guarantees you sane result of tar or cp -a while changing the tree. I don't see how is_samefile() could make it worse. There are several cases where changing the tree doesn't affect the correctness of the tar or cp -a result. In some of these cases using samefile() instead of

Re: Finding hardlinks

2007-01-08 Thread Miklos Szeredi
There's really no point trying to push for such an inferior interface when the problems which samefile is trying to address are purely theoretical. Oh yes, there is. st_ino is powerful, *but impossible to implement* on many filesystems. You mean POSIX compliance is impossible? So what?

Re: Finding hardlinks

2007-01-08 Thread Miklos Szeredi
You mean POSIX compliance is impossible? So what? It is possible to implement an approximation that is _at least_ as good as samefile(). One really dumb way is to set st_ino to the 'struct inode' pointer for example. That will sure as hell fit into 64bits and will give a unique (alas

[PATCH] fix quadratic behavior of shrink_dcache_parent()

2007-02-09 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] The time shrink_dcache_parent() takes, grows quadratically with the depth of the tree under 'parent'. This starts to get noticable at about 10,000. These kinds of depths don't occur normally, and filesystems which invoke shrink_dcache_parent() via

Re: [PATCH] fix quadratic behavior of shrink_dcache_parent()

2007-02-10 Thread Miklos Szeredi
The file system mounted on /tmp/z in the example contains 2^50 directories. heh. I do wonder how realistic this problem is in real life. That's a fair concern, although I was trying this as part of evaluating how much someone could hose a system if we let them mount arbitrary FUSE

Re: [PATCH] fix quadratic behavior of shrink_dcache_parent()

2007-02-11 Thread Miklos Szeredi
Unfortunately this patch doesn't completely solve this problem, since the system will still be hosed due to all memory being used up by dentries. And I bet the OOM killer won't find the real target (du) but will kill anything before that. So the second part of the problem is to

[RFC PATCH] add filesystem subtype support

2007-02-12 Thread Miklos Szeredi
There's a slight problem with filesystem type representation in fuse based filesystems. From the kernel's view, there are just two filesystem types: fuse and fuseblk. From the user's view there are lots of different filesystem types. The user is not even much concerned if the filesystem is fuse

Re: [RFC PATCH] add filesystem subtype support

2007-02-12 Thread Miklos Szeredi
There's a slight problem with filesystem type representation in fuse based filesystems. From the kernel's view, there are just two filesystem types: fuse and fuseblk. From the user's view there are lots of different filesystem types. The user is not even much concerned if the

Re: [RFC PATCH] add filesystem subtype support

2007-02-12 Thread Miklos Szeredi
-static struct file_system_type **find_filesystem(const char *name) +static struct file_system_type **find_filesystem(const char *name, unsigned len) { struct file_system_type **p; for (p=file_systems; *p; p=(*p)-next) -if (strcmp((*p)-name,name) == 0) +

[PATCH] consolidate generic_writepages and mpage_writepages

2007-02-16 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Clean up massive code duplication between mpage_writepages() and generic_writepages(). The new generic function, write_cache_pages() takes a function pointer argument, which will be called for each page to be written. Maybe cifs_writepages() too can use

Re: [Fwd: [PATCH] consolidate generic_writepages and mpage_writepages]

2007-02-17 Thread Miklos Szeredi
Maybe cifs_writepages() too can use this infrastructure, but I'm not touching that with a ten-foot pole. The cifs case ought to be one of the simpler ones, pseudo-code is pretty easy, the hard part is all of the stuff unrelated to cifs: Ideally if there were generic functions to help

Re: Accessing file-offset info for fds in /proc?

2007-02-20 Thread Miklos Szeredi
On Tue, 2007-02-20 at 02:31 -0500, Hank Leininger wrote: Is there anything provided by the kernel that would let you see the current offset of an existing filehandle? Sometimes when processing a very large file (grepping a log, bzip2'ing or gpg'ing a file, or whatever), I'd really like

[PATCH] update ctime and mtime for mmaped write

2007-02-21 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] This patch makes writing to shared memory mappings update st_ctime and st_mtime as defined by SUSv3: The st_ctime and st_mtime fields of a file that is mapped with MAP_SHARED and PROT_WRITE shall be marked for update at some point in the interval

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-21 Thread Miklos Szeredi
Inspired by Peter Staubach's patch and the resulting comments. An updated version of the original patch was submitted to LKML yesterday... :-) Strange coincidence :) file = vma-vm_file; start = vma-vm_end; + mapping_update_time(file);

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-21 Thread Miklos Szeredi
This flag is checked in msync() and __fput(), and if set, the file times are updated and the flag is cleared Why not also check inside vfs_getattr? This is the minimum, that the standard asks for. Note, your porposal would touch the times in vfs_getattr(), which means, that the

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-21 Thread Miklos Szeredi
This flag is checked in msync() and __fput(), and if set, the file times are updated and the flag is cleared Why not also check inside vfs_getattr? This is the minimum, that the standard asks for. Note, your porposal would touch the times in vfs_getattr(), which means,

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-21 Thread Miklos Szeredi
Inspired by Peter Staubach's patch and the resulting comments. An updated version of the original patch was submitted to LKML yesterday... :-) Strange coincidence :) file = vma-vm_file; start = vma-vm_end; +

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-21 Thread Miklos Szeredi
On Wed, 21 Feb 2007 18:51:52 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote: This patch makes writing to shared memory mappings update st_ctime and st_mtime as defined by SUSv3: The st_ctime and st_mtime fields of a file that is mapped with MAP_SHARED and PROT_WRITE shall

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-22 Thread Miklos Szeredi
Why is the flag checked in __fput()? It's because of this bit in the standard: If there is no such call and if the underlying file is modified as a result of a write reference, then these fields shall be marked for update at some time after the write reference.

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-22 Thread Miklos Szeredi
+int set_page_dirty_mapping(struct page *page); This aspect of the design seems intrusive to me. I didn't see a strong reason to introduce new versions of many of the routines just to handle these semantics. What motivated this part of your design? Why the new

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-22 Thread Miklos Szeredi
Take this example: fd = open() addr = mmap(.., fd) write(fd, ...) close(fd) sleep(100) msync(addr,...) munmap(addr) The file times will be updated in write(), but with your patch, the bit in the mapping will also be set. Then in msync() the

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-22 Thread Miklos Szeredi
__fput() will be called when there are no more references to 'file', then it will update the time if the flag is set. This applies to regular files as well as devices. I suspect that you will find that, for a block device, the wrong inode gets updated. That's where the

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-22 Thread Miklos Szeredi
This still does not address the situation where a file is 'permanently' mmap'd, does it? So? If application doesn't do msync, then the file times won't be updated. That's allowed by the standard, and so portable applications will have to call msync. It is allowed, but it is

[patch 02/22] fix quadratic behavior of shrink_dcache_parent()

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Changes: o dput already checks dentry == NULL, so remove check from prune_one_dentry() The time shrink_dcache_parent() takes, grows quadratically with the depth of the tree under 'parent'. This starts to get noticable at about 10,000. These kinds

[patch 00/22] misc VFS/VM patches and fuse writable shared mapping support

2007-02-27 Thread Miklos Szeredi
The first part of this series (1-7) contains miscellaneous patches, some of which are needed for fuse writable mmap to work correctly. Some of these are resends of patches already in -mm, with minor updates. The rest of the series adds shared writable mapping support to fuse, with some write

[patch 06/22] consolidate generic_writepages and mpage_writepages

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Changes: o fix theoretical NULL pointer dereference in __mpage_writepage o merge Andrew Morton's cleanups Clean up code duplication between mpage_writepages() and generic_writepages(). The new generic function, write_cache_pages() takes a function

[patch 04/22] fix deadlock in throttle_vm_writeout

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] This deadlock is similar to the one in balance_dirty_pages, but instead of waiting in balance_dirty_pages after submitting a write request, it happens during a memory allocation for filesystem B before submitting a write request. It is easy to reproduce

[patch 07/22] add filesystem subtype support

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] There's a slight problem with filesystem type representation in fuse based filesystems. From the kernel's view, there are just two filesystem types: fuse and fuseblk. From the user's view there are lots of different filesystem types. The user is not even

[patch 08/22] fuse: update backing_dev_info congestion state

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Set the read and write congestion state if the request queue is close to blocking, and clear it when it's not. This prevents unnecessary blocking in readahead and writeback. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/fs/fuse/dev.c

[patch 05/22] balance dirty pages from loop device

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] The function do_lo_send_aops() should call balance_dirty_pages_ratelimited() after each page similarly to generic_file_buffered_write(). Without this, writing the loop device directly (not through a filesystem) is very slow, and also slows the whole system

[patch 01/22] update ctime and mtime for mmaped write

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Changes: o moved check from __fput() to remove_vma(), which is more logical o changed set_page_dirty() to set_page_dirty_mapping in hugetlb.c o cleaned up #ifdef CONFIG_BLOCK mess This patch makes writing to shared memory mappings update st_ctime

[patch 15/22] add non-owner variant of down_read_trylock()

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Needed by fuse writepage. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/include/linux/rwsem.h === --- linux.orig/include/linux/rwsem.h2007-02-27 14:40:55.0 +0100

[patch 12/22] fuse: fix page invalidation

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Other than truncate, there are two cases, when fuse tries to get rid of cached pages: a) in open, if KEEP_CACHE flag is not set) b) in getattr, if file size changed spontaneously Until now invalidate_mapping_pages() were used, which didn't get rid

[patch 13/22] fuse: add list of writable files to fuse_inode

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Each WRITE request must carry a valid file descriptor. When a page is written back from a memory mapping, the file through which the page was dirtied is not available, so a new mechananism is needed to find a suitable file in -writepage(s). A list

[patch 21/22] fuse: limit dirty pages

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Add a per-filesystem limit for the number of dirty pages. If half the limit is reached, background writeback is started. If the limit is reached, then start some writeback and wait until the the number goes below the limit again. The dirty limit

[patch 22/22] fuse: allow big write requests

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Up to now, file writes were split into page size WRITE requests. This is inefficient, since there are two context switches per request. So allow bigger writes, but still do it synchronously. Asynchronous writeback would be even better, but is very

[patch 20/22] fuse: make dirty stats available

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Make per-filesystem statistics about dirty and under-writeback pages available through the fuse control filesystem. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/fs/fuse/control.c

[patch 16/22] fuse: add fuse_writepage() function

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Implement the -writepage address space operation. Be careful not to block if the wbc-nonblocking flag is set. Acquire the read-write truncation semaphore for read when allocating the request. Use the _non_owner variants, since the semaphore is held until

[patch 18/22] fuse: add fuse_writepages() function

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Implement the -writepages address space operation. This is very similar to fuse_writepage(), but batches multiple pages into a single request. It reuses the fuse_fill_data structure currently used by fuse_readpages(). Signed-off-by: Miklos Szeredi [EMAIL

[patch 19/22] export sync_sb() to modules

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Create a function sync_sb() and export it to modules. This is the generic interface for writing back dirty data from a single superblock. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/fs/fs-writeback.c

[patch 14/22] fuse: add helper for asynchronous writes

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] This patch adds a new helper function fuse_write_fill() which makes it possible to send WRITE requests asynchronously. A new flag for WRITE requests is also added which indicates that this a write from the page cache, and not a normal file write. Signed

[patch 10/22] fuse: add reference counting to fuse_file

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Make lifetime of 'struct fuse_file' independent from 'struct file' by adding a reference counter and destructor. This will enable asynchronous page writeback, where it cannot be guaranteed, that the file is not released while a request with this file handle

[patch 11/22] fuse: add truncation semaphore

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Add a new semaphore to prevent asynchronous page writeback during the TRUNCATE request. Using i_alloc_sem would almost work, but it has to be released before invalidating the truncated pages, so it's easier to define a separate one. Signed-off-by: Miklos

[patch 09/22] fuse: fix reserved request wake up

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED] Use wake_up_all instead of wake_up in put_reserved_req(), otherwise it is possible that the right task is not woken up. Also create a separate reserved_req_waitq in addition to the blocked_waitq, since they fulfill totally separate functions. Signed-off

  1   2   3   4   5   >