[rfc patch] how to show propagation state for mounts
If you get down to it, the thing is about delegating control over part of namespace to somebody, without letting them control, see, etc. the rest of it. So I'd rather be very conservative about extra information we allow to piggyback on that. I don't know... perhaps with stable peer group IDs it would be OK to show peer group ID by (our) vfsmount + peer group ID of master + peer group ID of nearest dominating group that has intersection with our namespace. Then we don't leak information (AFAICS), get full propagation information between our vfsmounts and cooperating tasks in different namespaces can figure the things out as much as possible without leaking 3rd-party information to either. Here's a patch against current -mm implementing this (with some cleanups thrown in). Done some testing on it as well, it wasn't entirey trivial to figure out a setup, where propagation goes out of the namespace first, then comes back in: mount --bind /mnt1 /mnt1 mount --make-shared /mnt1 mount --bind /mnt2 /mnt2 mount --make-shared /mnt2 newns mount --make-slave /mnt1 old ns: mount --make-slave /mnt2 mount --bind /mnt1/tmp /mnt1/tmp new ns: mount --make-shared /mnt1/tmp mount --bind /mnt1/tmp /mnt2/tmp Voila. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] --- Index: linux/fs/pnode.c === --- linux.orig/fs/pnode.c 2008-02-22 15:27:23.0 +0100 +++ linux/fs/pnode.c2008-02-22 15:27:26.0 +0100 @@ -9,8 +9,12 @@ #include linux/mnt_namespace.h #include linux/mount.h #include linux/fs.h +#include linux/idr.h #include pnode.h +static DEFINE_SPINLOCK(mnt_pgid_lock); +static DEFINE_IDA(mnt_pgid_ida); + /* return the next shared peer mount of @p */ static inline struct vfsmount *next_peer(struct vfsmount *p) { @@ -27,36 +31,90 @@ static inline struct vfsmount *next_slav return list_entry(p-mnt_slave.next, struct vfsmount, mnt_slave); } -static int __peer_group_id(struct vfsmount *mnt) +static void __set_mnt_shared(struct vfsmount *mnt) { - struct vfsmount *m; - int id = mnt-mnt_id; + mnt-mnt_flags = ~MNT_PNODE_MASK; + mnt-mnt_flags |= MNT_SHARED; +} + +void set_mnt_shared(struct vfsmount *mnt) +{ + int res; - for (m = next_peer(mnt); m != mnt; m = next_peer(m)) - id = min(id, m-mnt_id); + retry: + spin_lock(mnt_pgid_lock); + if (IS_MNT_SHARED(mnt)) { + spin_unlock(mnt_pgid_lock); + return; + } - return id; + res = ida_get_new(mnt_pgid_ida, mnt-mnt_pgid); + spin_unlock(mnt_pgid_lock); + if (res == -EAGAIN) { + if (ida_pre_get(mnt_pgid_ida, GFP_KERNEL)) + goto retry; + } + __set_mnt_shared(mnt); +} + +void clear_mnt_shared(struct vfsmount *mnt) +{ + if (IS_MNT_SHARED(mnt)) { + mnt-mnt_flags = ~MNT_SHARED; + mnt-mnt_pgid = -1; + } +} + +void make_mnt_peer(struct vfsmount *old, struct vfsmount *mnt) +{ + mnt-mnt_pgid = old-mnt_pgid; + list_add(mnt-mnt_share, old-mnt_share); + __set_mnt_shared(mnt); } -/* return the smallest ID within the peer group */ int get_peer_group_id(struct vfsmount *mnt) { + return mnt-mnt_pgid; +} + +int get_master_id(struct vfsmount *mnt) +{ int id; spin_lock(vfsmount_lock); - id = __peer_group_id(mnt); + id = get_peer_group_id(mnt-mnt_master); spin_unlock(vfsmount_lock); return id; } -/* return the smallest ID within the master's peer group */ -int get_master_id(struct vfsmount *mnt) +static struct vfsmount *get_peer_in_ns(struct vfsmount *mnt, + struct mnt_namespace *ns) { - int id; + struct vfsmount *m = mnt; + + do { + if (m-mnt_ns == ns) + return m; + m = next_peer(m); + } while (m != mnt); + + return NULL; +} + +int get_dominator_id_same_ns(struct vfsmount *mnt) +{ + int id = -1; + struct vfsmount *m; spin_lock(vfsmount_lock); - id = __peer_group_id(mnt-mnt_master); + for (m = mnt-mnt_master; m != NULL; m = m-mnt_master) { + struct vfsmount *d = get_peer_in_ns(m, mnt-mnt_ns); + if (d) { + id = d-mnt_pgid; + break; + } + } spin_unlock(vfsmount_lock); return id; @@ -80,7 +138,13 @@ static int do_make_slave(struct vfsmount if (peer_mnt == mnt) peer_mnt = NULL; } - list_del_init(mnt-mnt_share); + if (!list_empty(mnt-mnt_share)) + list_del_init(mnt-mnt_share); + else if (IS_MNT_SHARED(mnt)) { + spin_lock(mnt_pgid_lock); + ida_remove(mnt_pgid_ida, mnt-mnt_pgid); +
how to show propagation state for mounts
mountinfo - IMO needs a sane discussion of what and how should be shown wrt propagation state Here's my take on the matter. The propagation tree can be either be represented 1) from root to leaf listing members of peer groups and their slaves explicitly, 2) or from leaf to root by identifying each peer group and then for each mount showing the id of its own group and the id of the group's master. 2) can have two variants: 2a) id of peer group is constant in time 2b) id of peer group may change The current patch does 2b). Having a fixed id for each peer group would mean introducing a new object to anchor the peer group into, which would add complexity to the whole thing. All of these are implementable, just need to decide which one we want. Miklos - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to show propagation state for mounts
On Wed, Feb 20, 2008 at 04:39:15PM +0100, Miklos Szeredi wrote: mountinfo - IMO needs a sane discussion of what and how should be shown wrt propagation state Here's my take on the matter. The propagation tree can be either be represented 1) from root to leaf listing members of peer groups and their slaves explicitly, 2) or from leaf to root by identifying each peer group and then for each mount showing the id of its own group and the id of the group's master. 2) can have two variants: 2a) id of peer group is constant in time 2b) id of peer group may change The current patch does 2b). Having a fixed id for each peer group would mean introducing a new object to anchor the peer group into, which would add complexity to the whole thing. All of these are implementable, just need to decide which one we want. Eh... Much more interesting question: since the propagation tree spans multiple namespaces in a lot of normal uses, how do we deal with reconstructing propagation through the parts that are not present in our namespace? Moreover, what should and what should not be kept private to namespace? Full exposure of mount trees is definitely over the top (it shows potentially sensitive information), so we probably want less than that. FWIW, my gut feeling is that for each peer group that intersects with our namespace we ought to expose in some form * all vfsmounts belonging to that intesection * the nearest dominating peer group (== master (of master ...) of) that also has a non-empty intersection with our namespace It's less about the form of representation (after all, we generate poll events when contents of that sucker changes, so one *can* get a consistent snapshot of the entire thing) and more about having it self-contained when we have namespaces in the play. IOW, the data in there should give answers to questions that make sense. Do events get propagated from this vfsmount I have to that vfsmount I have? is a meaningful one; ditto for are events here propagated to somewhere I don't see? or are events getting propagated here from somewhere I don't see?. Dumping pieces of raw graph, with IDs of nodes we can't see and without any way to connect those pieces, OTOH, doesn't make much sense. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to show propagation state for mounts
On Wed, Feb 20, 2008 at 04:39:15PM +0100, Miklos Szeredi wrote: mountinfo - IMO needs a sane discussion of what and how should be shown wrt propagation state Here's my take on the matter. The propagation tree can be either be represented 1) from root to leaf listing members of peer groups and their slaves explicitly, 2) or from leaf to root by identifying each peer group and then for each mount showing the id of its own group and the id of the group's master. 2) can have two variants: 2a) id of peer group is constant in time 2b) id of peer group may change The current patch does 2b). Having a fixed id for each peer group would mean introducing a new object to anchor the peer group into, which would add complexity to the whole thing. All of these are implementable, just need to decide which one we want. Eh... Much more interesting question: since the propagation tree spans multiple namespaces in a lot of normal uses, how do we deal with reconstructing propagation through the parts that are not present in our namespace? Moreover, what should and what should not be kept private to namespace? Full exposure of mount trees is definitely over the top (it shows potentially sensitive information), so we probably want less than that. FWIW, my gut feeling is that for each peer group that intersects with our namespace we ought to expose in some form * all vfsmounts belonging to that intesection * the nearest dominating peer group (== master (of master ...) of) that also has a non-empty intersection with our namespace It's less about the form of representation (after all, we generate poll events when contents of that sucker changes, so one *can* get a consistent snapshot of the entire thing) and more about having it self-contained when we have namespaces in the play. IOW, the data in there should give answers to questions that make sense. Do events get propagated from this vfsmount I have to that vfsmount I have? is a meaningful one; ditto for are events here propagated to somewhere I don't see? or are events getting propagated here from somewhere I don't see?. Well, assuming you see only one namespace. When I'm experimenting with namespaces and propagations, I see both (each in a separate xterm) and I do want to know how propagation between them happens. Your suggestion doesn't deal with that problem. Otherwise, yes it makes sense to have a consistent view of the tree shown for each namespace. Perhaps the solution is to restrict viewing the whole tree to privileged processes. Miklos - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to show propagation state for mounts
On Wed, Feb 20, 2008 at 04:04:22PM +, Al Viro wrote: It's less about the form of representation (after all, we generate poll events when contents of that sucker changes, so one *can* get a consistent snapshot of the entire thing) and more about having it self-contained when we have namespaces in the play. IOW, the data in there should give answers to questions that make sense. Do events get propagated from this vfsmount I have to that vfsmount I have? is a meaningful one; ditto for are events here propagated to somewhere I don't see? or are events getting propagated here from somewhere I don't see?. Why do those last two questions deserve an answer? How will a person's or application's behaviour be affected by whether a change will propagate to something they don't know about and can't see? -- Intel are signing my paycheques ... these opinions are still mine Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to show propagation state for mounts
On Wed, 2008-02-20 at 09:31 -0700, Matthew Wilcox wrote: On Wed, Feb 20, 2008 at 04:04:22PM +, Al Viro wrote: It's less about the form of representation (after all, we generate poll events when contents of that sucker changes, so one *can* get a consistent snapshot of the entire thing) and more about having it self-contained when we have namespaces in the play. IOW, the data in there should give answers to questions that make sense. Do events get propagated from this vfsmount I have to that vfsmount I have? is a meaningful one; ditto for are events here propagated to somewhere I don't see? or are events getting propagated here from somewhere I don't see?. Why do those last two questions deserve an answer? How will a person's or application's behaviour be affected by whether a change will propagate to something they don't know about and can't see? Well, I do not want to be surprised to see a mount suddenly show up in my namespace because of some action by some other user in some other namespace. Its going to happen anyway if the namespace is forked of a namespace that had shared mounts in them. However I would rather prefer to know in advance the spots (mounts) where such surprises can happen. Also I would prefer to know how my actions will effect mounts in other namespaces. RP - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to show propagation state for mounts
On Wed, 2008-02-20 at 17:27 +0100, Miklos Szeredi wrote: On Wed, Feb 20, 2008 at 04:39:15PM +0100, Miklos Szeredi wrote: mountinfo - IMO needs a sane discussion of what and how should be shown wrt propagation state Here's my take on the matter. The propagation tree can be either be represented 1) from root to leaf listing members of peer groups and their slaves explicitly, 2) or from leaf to root by identifying each peer group and then for each mount showing the id of its own group and the id of the group's master. 2) can have two variants: 2a) id of peer group is constant in time 2b) id of peer group may change The current patch does 2b). Having a fixed id for each peer group would mean introducing a new object to anchor the peer group into, which would add complexity to the whole thing. All of these are implementable, just need to decide which one we want. Eh... Much more interesting question: since the propagation tree spans multiple namespaces in a lot of normal uses, how do we deal with reconstructing propagation through the parts that are not present in our namespace? Moreover, what should and what should not be kept private to namespace? Full exposure of mount trees is definitely over the top (it shows potentially sensitive information), so we probably want less than that. FWIW, my gut feeling is that for each peer group that intersects with our namespace we ought to expose in some form * all vfsmounts belonging to that intesection * the nearest dominating peer group (== master (of master ...) of) that also has a non-empty intersection with our namespace It's less about the form of representation (after all, we generate poll events when contents of that sucker changes, so one *can* get a consistent snapshot of the entire thing) and more about having it self-contained when we have namespaces in the play. IOW, the data in there should give answers to questions that make sense. Do events get propagated from this vfsmount I have to that vfsmount I have? is a meaningful one; ditto for are events here propagated to somewhere I don't see? or are events getting propagated here from somewhere I don't see?. Well, assuming you see only one namespace. When I'm experimenting with namespaces and propagations, I see both (each in a separate xterm) and I do want to know how propagation between them happens. Your suggestion doesn't deal with that problem. Otherwise, yes it makes sense to have a consistent view of the tree shown for each namespace. Perhaps the solution is to restrict viewing the whole tree to privileged processes. I wonder, what is wrong in reporting mounts in other namespaces that either receive and send propagation to mounts in our namespace? If we take that approach, we will report **only** the mounts in other namespace which have a counter part in our namespace. After all the filesystems backing the mounts here and there are the same(other wise they would'nt have propagated). And any mounts contained outside our namespace, having no propagation relation to any mounts in our namespace, will remain hidden. RP Miklos - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to show propagation state for mounts
On Wed, Feb 20, 2008 at 11:29:13AM -0800, Ram Pai wrote: I wonder, what is wrong in reporting mounts in other namespaces that either receive and send propagation to mounts in our namespace? A plenty. E.g. if foo trusts control over /var/blah to bar, it's not obvious that foo has any business knowing if bar gets it from somebody else in turn. And I'm not sure that bar has any business knowing that foo has the damn thing attached in five places instead of just one, let alone _where_ it has been attached. If you get down to it, the thing is about delegating control over part of namespace to somebody, without letting them control, see, etc. the rest of it. So I'd rather be very conservative about extra information we allow to piggyback on that. I don't know... perhaps with stable peer group IDs it would be OK to show peer group ID by (our) vfsmount + peer group ID of master + peer group ID of nearest dominating group that has intersection with our namespace. Then we don't leak information (AFAICS), get full propagation information between our vfsmounts and cooperating tasks in different namespaces can figure the things out as much as possible without leaking 3rd-party information to either. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to show propagation state for mounts
I wonder, what is wrong in reporting mounts in other namespaces that either receive and send propagation to mounts in our namespace? A plenty. E.g. if foo trusts control over /var/blah to bar, it's not obvious that foo has any business knowing if bar gets it from somebody else in turn. And I'm not sure that bar has any business knowing that foo has the damn thing attached in five places instead of just one, let alone _where_ it has been attached. If you get down to it, the thing is about delegating control over part of namespace to somebody, without letting them control, see, etc. the rest of it. So I'd rather be very conservative about extra information we allow to piggyback on that. I don't know... perhaps with stable peer group IDs it would be OK to show peer group ID by (our) vfsmount + peer group ID of master + peer group ID of nearest dominating group that has intersection with our namespace. Then we don't leak information (AFAICS), get full propagation information between our vfsmounts and cooperating tasks in different namespaces can figure the things out as much as possible without leaking 3rd-party information to either. This sounds fine. I'll have a look at implementing a stable peer group ID (it doesn't need a separate object, I realized that now). Miklos - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html