[Devel] [RFC PATCH 0/31] An introduction and A path for merging network namespace work

2007-01-25 Thread Eric W. Biederman
The idea of a network namespace is fundamentally quite simple. We create a mechanism that from the users perspective allows creation of separate instances of the network stack. When combined with mechanism like chroot this results in a much more complete isolation. When seen in the context of

[Devel] [PATCH RFC 13/31] net: Make device event notification network namespace safe

2007-01-25 Thread Eric W. Biederman
From: Eric W. Biederman [EMAIL PROTECTED] - unquoted Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network

[Devel] [PATCH RFC 4/31] net: Add a network namespace tag to struct net_device

2007-01-25 Thread Eric W. Biederman
From: Eric W. Biederman [EMAIL PROTECTED] - unquoted Please note that network devices do not increase the count count on the network namespace. The are inside the network namespace and so the network namespace tag is in the nature of a back pointer and so getting and putting the network

[Devel] [PATCH RFC 6/31] net: Add a helper to get a reference to the initial network namespace.

2007-01-25 Thread Eric W. Biederman
From: Eric W. Biederman [EMAIL PROTECTED] - unquoted The initial network namespace is special and we need to use it for various things. Probably the biggest initial use will be to ensure code that can't cope with multiple namespaces only sees the initial network namespace. For that reason

[Devel] [PATCH RFC 27/31] net: Make the xfrm sysctls per network namespace.

2007-01-25 Thread Eric W. Biederman
From: Eric W. Biederman [EMAIL PROTECTED] - unquoted In particalure I moved: /proc/sys/net/core/xfrm_aevent_etime /proc/sys/net/core/xfrm_aevent_rseqth Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/net/xfrm.h |4 ++-- net/core/sysctl_net_core.c | 37

[Devel] [PATCH RFC 17/31] net: Factor out __dev_alloc_name from dev_alloc_name

2007-01-25 Thread Eric W. Biederman
From: Eric W. Biederman [EMAIL PROTECTED] - unquoted When forcibly changing the network namespace of a device I need something that can generate a name for the device in the new namespace without overwriting the old name. __dev_alloc_name provides me that functionality. Signed-off-by: Eric W

[Devel] [PATCH RFC 2/31] net: Implement a place holder network namespace

2007-01-25 Thread Eric W. Biederman
From: Eric W. Biederman [EMAIL PROTECTED] - unquoted Many of the changes to the network stack will simply be adding a network namespace parameter to function calls or moving variables from globals to being per network namespace. When those variables have initializers that cannot statically

[Devel] [PATCH RFC 18/31] net: Implment network device movement between namespaces

2007-01-25 Thread Eric W. Biederman
From: Eric W. Biederman [EMAIL PROTECTED] - unquoted This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate a network device is local to a single network namespace and should never be moved. Useful for pseudo devices that we need an instance in each network namespace (like the loopback

[Devel] [PATCH RFC 14/31] net: Support multiple network namespaces with netlink

2007-01-25 Thread Eric W. Biederman
From: Eric W. Biederman [EMAIL PROTECTED] - unquoted Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other

[Devel] [PATCH RFC 20/31] net: Implement CONFIG_NET_NS

2007-01-25 Thread Eric W. Biederman
From: Eric W. Biederman [EMAIL PROTECTED] - unquoted Add the config option to enable multiple network namespaces. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- net/Kconfig |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/net/Kconfig b/net/Kconfig index

[Devel] [PATCH RFC 12/31] net: Make packet reception network namespace safe

2007-01-25 Thread Eric W. Biederman
From: Eric W. Biederman [EMAIL PROTECTED] - unquoted This patch modifies every packet receive function registered with dev_add_pack() to drop packets if they are not from the initial network namespace, in addition to ensure consistency of argument passing the unnecessary device parameter

[Devel] Re: + user-ns-implement-user-ns-unshare-remove-config_user_ns.patch added to -mm tree

2007-01-25 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes: As it sits right now using the user namespace instead of being an enhancement of security as it should feels like security loophole 101. That's a bit of a callous exaggeration, don't you think? It takes what used to be one big pool and partitions

[Devel] Re: [PATCH RFC 1/31] net: Add net_namespace_type.h to allow for per network namespace variables.

2007-01-25 Thread Eric W. Biederman
Stephen Hemminger [EMAIL PROTECTED] writes: Can all this be a nop if a CONFIG option is not selected? That is exactly what this infrastructure supports. What you see is the version that comes into effect when the CONFIG option is not selected. From using an empty structure to replace a pointer

[Devel] Re: [IPC]: Logical refcount loop in ipc ns - massive leakage

2007-02-04 Thread Eric W. Biederman
Kirill Korotaev [EMAIL PROTECTED] writes: Guys, Though I have no patch in the hands for mainstream, I feel a responsibility to report one majore problem related to IPC namespace design. The problem is about refcounting scheme which is used. There is a leak in IPC namespace due to

[Devel] [PATCH] Fix SAK_work workqueue initialization.

2007-02-13 Thread Eric W. Biederman
PREPARE_WORK calls that are now gone. If we call schedule_work again before it has processed it has generated the first SAK it will simply ignore the duplicate schedule_work request. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- drivers/char/keyboard.c |1 - drivers/char/sysrq.c

[Devel] Re: [patch 0/1] [RFC][net namespace] veth ioctl management

2007-02-19 Thread Eric W. Biederman
Daniel Lezcano [EMAIL PROTECTED] writes: The following patch is an upgrade proposition for the veth pass-through driver. The temporary proc interface has been replaced by an ioctl. The device is a misc device. The major number is 10 and the minor number is dynamically allocated. The minor

[Devel] Re: [patch 0/1] [RFC][net namespace] veth ioctl management

2007-02-19 Thread Eric W. Biederman
Dmitry Mishin [EMAIL PROTECTED] writes: Fully agree. But as I can see, your code arises no more comments, than ours. So, we need to find other ways. Do you have more ideas? Yes. To some extent we should probably compare notes and see which parts of the various implementations are good/bad.

[Devel] Re: [patch 0/1] [RFC][net namespace] veth ioctl management

2007-02-19 Thread Eric W. Biederman
Kirill Korotaev [EMAIL PROTECTED] writes: Eric, Mostly it is six of one half a dozen of the other as far as the actual implementation is concerned. The practical difference is etun is not tied in any way shape or form to namespaces, whereas veth appears to be. veth is not tied to

[Devel] Re: [PATCH 0/7] containers (V7): Generic Process Containers

2007-02-20 Thread Eric W. Biederman
Paul Menage [EMAIL PROTECTED] writes: What are you defining here as everything? If you mean all things that could be applied to a segregated group of processes such as a virtual server, then container seems like a good name for my patches, since it allows you to aggregate namespaces, resource

[Devel] Re: [RFC] ns containers (v2): namespace entering

2007-02-22 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes: How about we solve both this and the general ugliness of using ptrace with a new hijack_and_clone(struct task_struct *tsk, int fd) Which takes tsk, clones it, and execs the contents of fd? That is what roughly what I was thinking. Although

[Devel] Re: [RFC] ns containers (v2): namespace entering

2007-02-22 Thread Eric W. Biederman
Paul Menage [EMAIL PROTECTED] writes: When I implemented a virtual server solution at my previous job, we solved the problem of leaking capabilities into the virtualized environment by allowing a process to enter the virtual server in a privileged mode, in which it didn't appear in the

[Devel] Re: [PATCH] Use task_pgrp() and task_session in binfmt

2007-02-22 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [PATCH] Use task_pgrp() and task_session in binfmt Use container friendly interfaces task_pgrp() and task_session() in binfmt files Why? Unless we intend to kill process_session and process_group this doesn't

[Devel] Re: [PATCH] pidspace rocket driver

2007-02-22 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [PATCH] pidspace rocket driver The process_session() and process_group() values are not really used by the driver. Looks reasonable. I'd prefer a summary like. Kill unused session and group values in rocket

[Devel] Re: [PATCH] Use struct pid parameter in copy_process()

2007-02-22 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [PATCH] Use struct pid parameter in copy_process() Modify copy_process() to take a struct pid * parameter instead of a pid_t. This simplifies the code a bit and also avoids having to call find_pid() to convert

[Devel] Re: [PATCH] Use task_pgrp() in solaris procids

2007-02-22 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [PATCH] Use task_pgrp() in solaris procids Use task_pgrp() in solaris procids code. Please no more changes like this. It's just noise. process_group and process_session are not inherently evil. When used with the

Re: [Devel] [PATCH 4/4] Use task_pgrp() in autofs/autofs4

2007-02-28 Thread Eric W. Biederman
Cedric Le Goater [EMAIL PROTECTED] writes: [EMAIL PROTECTED] wrote: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [PATCH 4/4] Use task_pgrp() in autofs/autofs4 Replace process_group(tsk) with pid_nr(task_pgrp(tsk)) in autofs and autofs4. you will need EXPORT_SYMBOL_GPL for pid_nr()

[Devel] Re: [PATCH RFC 22/31] net: Add network namespace clone support.

2007-02-28 Thread Eric W. Biederman
Daniel Lezcano [EMAIL PROTECTED] writes: + +mutex_lock(net_mutex); +err = setup_net(new_net); +if (err) +goto out_unlock; Should we net_free in case of error ? Oops. Yes we should. Thanks. +net_lock(); +net_list_append(new_net); +net_unlock(); +

[Devel] Re: [PATCH RFC 18/31] net: Implment network device movement between namespaces

2007-02-28 Thread Eric W. Biederman
Daniel Lezcano [EMAIL PROTECTED] writes: Eric W. Biederman wrote: From: Eric W. Biederman [EMAIL PROTECTED] - unquoted This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate a network device is local to a single network namespace and should never be moved. Useful for pseudo devices

[Devel] Re: [RFC PATCH 0/31] An introduction and A path for merging network namespace work

2007-02-28 Thread Eric W. Biederman
Daniel Lezcano [EMAIL PROTECTED] writes: Hi Eric, Do you plan to propose to merge into mainline your patchset ? I'm hung up at the moment in the sysfs support. Network device renaming is broken in 2.6.21-rc2 at the moment. Then I would like to see the best of etun/veth merged. After that

[Devel] Re: [RFC PATCH 0/31] An introduction and A path for merging network namespace work

2007-03-06 Thread Eric W. Biederman
Daniel Lezcano [EMAIL PROTECTED] writes: Eric W. Biederman wrote: [ cut ] Dmitry? Daniel? What do you think. Hi Eric, I agree with all the points you presented but I am still 50/50 for both approaches. The major argument in favor of the explicit parameter is that it allows to keep

Re: [Devel] [PATCH 4/4] Use task_pgrp() in autofs/autofs4

2007-03-07 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: | | | I have largely given up on review this patch set until that is fixed. | | I am sending out the patches with the noise cancelled :-) Would like | to send them out to akpm in a few days. | | Unfortunately I won't have a chance to do anything until

[Devel] Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman
Paul Menage [EMAIL PROTECTED] writes: On 3/7/07, Sam Vilain [EMAIL PROTECTED] wrote: But namespace has well-established historical semantics too - a way of changing the mappings of local * to global objects. This accurately describes things liek resource controllers, cpusets, resource

[Devel] Re: [PATCH 1/2] rcfs core patch

2007-03-07 Thread Eric W. Biederman
Srivatsa Vaddagiri [EMAIL PROTECTED] writes: Heavily based on Paul Menage's (inturn cpuset) work. The big difference is that the patch uses task-nsproxy to group tasks for resource control purpose (instead of task-containers). The patch retains the same user interface as Paul Menage's

[Devel] Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman
Sam Vilain [EMAIL PROTECTED] writes: And do we bother changing IPC namespaces or let that one slide? ipc namespaces works (if you worry about tiny details like we put the resource limits for the sysv ipc objects inside the namespace). Probably the most instructive example of this is that you

[Devel] Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman
Matt Helsley [EMAIL PROTECTED] writes: On Thu, 2007-03-08 at 16:32 +-1300, Sam Vilain wrote: +ADw-snip+AD4 +AD4 Kirill, 06032418:36+-03: +AD4 +AD4 I propose to use +ACI-namespace+ACI naming. +AD4 +AD4 1. This is already used in fs. +AD4 +AD4 2. This is what IMHO suites at least

[Devel] Re: Pid namespace patchsets review

2007-03-10 Thread Eric W. Biederman
Herbert Poetzl [EMAIL PROTECTED] writes: IMHO not the best idea, mainly because both OpenVZ and Linux-VServer will end up either duplicating the pid code or using the incomplete (broken) version which probably gives the pid space a bad start ... I'd prefer to focus on fixing up the

[Devel] Re: [RFC][PATCH 0/6] Allow unsharing pid namespace

2007-03-11 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [RFC][PATCH 0/6] Allow unsharing pid namespace. This patchset defines a struct pid_nr and uses this to allow processes to unshare their pid namespace. struct pid_nr will hold [pid value, namespace] pair for each

[Devel] Re: [RFC][PATCH 4/6] Initialize struct pid_nr for swapper

2007-03-11 Thread Eric W. Biederman
Herbert Poetzl [EMAIL PROTECTED] writes: On Fri, Mar 09, 2007 at 07:59:24PM -0800, [EMAIL PROTECTED] wrote: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [RFC][PATCH 4/6] Initialize struct pid_nr for swapper. Statically initialize a struct pid_nr for the swapper process. does

[Devel] Re: [RFC][PATCH 1/6] Add struct pid_nr

2007-03-11 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Cedric Le Goater [EMAIL PROTECTED] Subject: [RFC][PATCH 1/6] Add struct pid_nr Define struct pid_nr and some helper functions that will be used in subsequent patches. Changelog: - [Serge Hallyn comment]: Remove (!pid_nr) check in free_pid_nr()

[Devel] Re: [RFC][PATCH 3/6] pid namespace : use struct pid_nr

2007-03-11 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Cedric Le Goater [EMAIL PROTECTED] Subject: [RFC][PATCH 3/6] pid namespace : use struct pid_nr Allocate and attach a struct pid nr to the struct pid. When freeing the pid, free the attached struct pid nrs. Changelog: - [Serge Hallyn's comment]: Add

[Devel] Re: [RFC][PATCH 5/6] Define helper functions to unshare pid namespace

2007-03-11 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [RFC][PATCH 5/6] Define helper functions to unshare pid namespace Define clone_pid_ns() and unshare_pid_ns() functions that will be used in the next patch to unshare pid namespace. Changelog: - Rewrite

[Devel] Re: [RFC][PATCH 3/5] Use pid namespace from struct pid_nrs list

2007-03-11 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [RFC][PATCH 3/5] Use pid namespace from struct pid_nrs list Stop using task-nsproxy-pid_ns. Use pid_namespace from pid-pid_nrs list instead. To simplify error handling, this patch moves processing of

[Devel] Re: Pid namespace patchsets review

2007-03-11 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes: If we're going to put the resource stuff in, then I agree let's rename. If we stick to this being a namespace proxy (my preference) then calling it nsproxy is more accurate. Sounds like a reasonable criteria. (I can't keep up with that thread so

[Devel] Re: [RFC][PATCH 3/6] pid namespace : use struct pid_nr

2007-03-13 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: I thought about keeping track of parent pid namespace, but did find the need for it yet. When do we expect to walk parent/ ancestor pid namespaces ? When cloning we can walk the parent pid-pid_nrs list and duplicate them for the child struct pid. If CLONE_NEWPID

[Devel] Re: [RFC][PATCH 6/6]: Enable unsharing pid namespace.

2007-03-13 Thread Eric W. Biederman
Cedric Le Goater [EMAIL PROTECTED] writes: | Index: lx26-20-mm2b/kernel/nsproxy.c | === | --- lx26-20-mm2b.orig/kernel/nsproxy.c 2007-03-09 14:56:12.0 -0800 | +++ lx26-20-mm2b/kernel/nsproxy.c2007-03-09

[Devel] Re: [RFC][PATCH 4/7] RSS accounting hooks over the code

2007-03-13 Thread Eric W. Biederman
Herbert Poetzl [EMAIL PROTECTED] writes: On Mon, Mar 12, 2007 at 09:50:08AM -0700, Dave Hansen wrote: On Mon, 2007-03-12 at 19:23 +0300, Kirill Korotaev wrote: For these you essentially need per-container page-_mapcount counter, otherwise you can't detect whether rss group still has the

[Devel] Re: [RFC][PATCH 4/7] RSS accounting hooks over the code

2007-03-13 Thread Eric W. Biederman
Nick Piggin [EMAIL PROTECTED] writes: Eric W. Biederman wrote: First touch page ownership does not guarantee give me anything useful for knowing if I can run my application or not. Because of page sharing my application might run inside the rss limit only because I got lucky and happened

[Devel] Re: [RFC][PATCH 1/7] Resource counters

2007-03-15 Thread Eric W. Biederman
Pavel Emelianov [EMAIL PROTECTED] writes: Srivatsa Vaddagiri wrote: On Tue, Mar 13, 2007 at 06:41:05PM +0300, Pavel Emelianov wrote: right, but atomic ops have much less impact on most architectures than locks :) Right. But atomic_add_unless() is slower as it is essentially a loop. See my

[Devel] Re: [RFC][PATCH 2/7] RSS controller core

2007-03-15 Thread Eric W. Biederman
Alan Cox [EMAIL PROTECTED] writes: stuff is happening by comparing page-count and page-_mapcount, but it certainly wouldn't be conclusive. But, does this kind of nonsense even happen in practice? Is it useful for me as a bad guy to make it happen ? To create a DOS attack. - Allocate

[Devel] Re: [RFC] kernel/pid.c pid allocation wierdness

2007-03-16 Thread Eric W. Biederman
at 10:54:07AM -0600, Eric W. Biederman wrote: Possibly. We aren't that sparsely populated when it comes to pids. Hash tables aren't good at saving space either, and when they are space efficient they are on the edge of long hash chains so they are on the edge of performance problems

[Devel] Re: + remove-the-likelypid-check-in-copy_process.patch added to -mm tree

2007-03-16 Thread Eric W. Biederman
Oleg Nesterov [EMAIL PROTECTED] writes: Sukadev Bhattiprolu wrote: This means that idle threads (except swapper) are visible to for_each_process() and do_each_thread(). Looks dangerous and somewhat strange to me. Could you explain this change? Good catch. I've been so busy pounding

[Devel] Re: [RFC] kernel/pid.c pid allocation wierdness

2007-03-16 Thread Eric W. Biederman
William Lee Irwin III [EMAIL PROTECTED] writes: On Fri, Mar 16, 2007 at 07:04:28AM -0600, Eric W. Biederman wrote: Grr. s/patricia tree/fib tree/. We use that in the networking for the forwarding information base and I got mis-remembered it. Anyway the interesting thing with the binary

Re: [Devel] Re: [RFC][PATCH 2/7] RSS controller core

2007-03-19 Thread Eric W. Biederman
Paul Menage [EMAIL PROTECTED] writes: On 3/13/07, Dave Hansen [EMAIL PROTECTED] wrote: How do we determine what is shared, and goes into the shared zones? Once we've allocated a page, it's too late because we already picked. Do we just assume all page cache is shared? Base it on filesystem,

[Devel] Re: [PATCH 2/2] Replace pid_t in autofs with struct pid reference

2007-03-19 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes: Index: 2.6.20/fs/autofs4/waitq.c === --- 2.6.20.orig/fs/autofs4/waitq.c +++ 2.6.20/fs/autofs4/waitq.c @@ -292,8 +292,8 @@ int autofs4_wait(struct autofs_sb_info *

[Devel] Re: [PATCH 2/2] Replace pid_t in autofs with struct pid reference

2007-03-19 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes: True, current-pid can probably always be legitimately taken as the pid number in the current task's cloning namespace. But task-pid is wrong. Agreed. So if as you say it's worth caching (not saying I doubt you, just that I haven't verified), then

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-19 Thread Eric W. Biederman
Dave Hansen [EMAIL PROTECTED] writes: On Mon, 2007-03-19 at 20:04 -0600, Eric W. Biederman wrote: Dave Hansen [EMAIL PROTECTED] writes: Regardless I would like to see a little farther down on how we test to see if the pid namespace is alive and how we make these functions do nothing

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-20 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting Eric W. Biederman ([EMAIL PROTECTED]): Dave Hansen [EMAIL PROTECTED] writes: On Mon, 2007-03-19 at 20:04 -0600, Eric W. Biederman wrote: I would also like to see how we perform the appropriate lookups by pid namespace. What do you

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-20 Thread Eric W. Biederman
Dave Hansen [EMAIL PROTECTED] writes: On Tue, 2007-03-20 at 09:51 -0600, Eric W. Biederman wrote: Outlive is the wrong concept. Ideally we want something that will live as long as there are processes in the pid_ns. How about they just live as long as there is a mount? Now that we _can_

[Devel] Re: [RFC][PATCH 02/14] Move alloc_pid call to copy_process

2007-03-21 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [RFC][PATCH 02/14] Move alloc_pid call to copy_process Move alloc_pid() into copy_process(). This will help in code to support cloning of pid namespace. I think this makes sense. However if we are doing it let's

[Devel] Re: [RFC][PATCH 03/14] use pid_nr in procfs

2007-03-21 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [RFC][PATCH 03/14] use pid_nr in procfs With containers, a process can have different pid_t values in different pid namespaces. To ensure we get the correct pid_t value in any context, we should use pid_nr()

[Devel] Re: [RFC][PATCH 09/14] Save leaders struct pid before detach_pid()

2007-03-21 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [RFC][PATCH 09/14] Save leaders struct pid before detach_pid() Save the struct pid of the thread group leader before detaching our pid. See comments in the code below for more details. Signed-off-by: Sukadev

[Devel] Re: [RFC][PATCH 12/14] Remove copy_pid_ns function

2007-03-21 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [RFC][PATCH 12/14] Remove copy_pid_ns function Remove the copy_pid_ns() function as we have decoupled pid namespace from nsproxy and also because we currently disallow unsharing of pid namespace. Where do we

[Devel] Re: [RFC][PATCH 06/14] Populate pid_nrs list with entry for init-pid-ns

2007-03-21 Thread Eric W. Biederman
[EMAIL PROTECTED] writes: Signed-off-by: Cedric Le Goater [EMAIL PROTECTED] Signed-off-by: Sukadev Bhattiprolu [EMAIL PROTECTED] --- include/linux/pid.h |9 -- kernel/pid.c | 73 ++-- 2 files changed, 50 insertions(+), 32 deletions(-)

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-21 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes: So how do you see us enforcing pid1's existance? Somehow keep it from fully exiting, or just kill all the processes in it's namespace if it exits? Killing all other processes in the namespace when pid1 exits is what I implemented last time around.

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-21 Thread Eric W. Biederman
Cedric Le Goater [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Serge E. Hallyn [EMAIL PROTECTED] writes: So how do you see us enforcing pid1's existance? Somehow keep it from fully exiting, or just kill all the processes in it's namespace if it exits? what about a kthread

[Devel] Re: [RFC][PATCH 13/14] Define CLONE_NEWPID flag

2007-03-21 Thread Eric W. Biederman
Cedric Le Goater [EMAIL PROTECTED] writes: [EMAIL PROTECTED] wrote: This was discussed on containers and we thought it would be useful to reserve this flag. --- From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [RFC][PATCH 13/14] Define CLONE_NEWPID flag Define CLONE_NEWPID flag

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-21 Thread Eric W. Biederman
Cedric Le Goater [EMAIL PROTECTED] writes: what about a kthread that would be spawned when a task is cloned in an unshared pid namespace ? This is an extra cost in term of tasks. If you use kernel_thread this can happen. (Die kernel_thread). If you use the kthread interface keventd will be

[Devel] Re: [PATCHSET] 2.6.20-lxc8

2007-03-21 Thread Eric W. Biederman
Daniel Lezcano [EMAIL PROTECTED] writes: Hi Herbert, I played with the L2 namespace patchset from Eric Biederman, I did some benchmarking with netperf: With 2 hosts, Intel EM64T bipro HT / 2,4 GHz , 4Go ram and GB network. Host A is running the netserver on a RH4 kernel 2.6.9-42 Host B is

[Devel] Re: [PATCH] Define CLONE_NEWPID flag

2007-03-21 Thread Eric W. Biederman
Andrew Morton [EMAIL PROTECTED] writes: Index: lx26-21-rc3-mm2/include/linux/sched.h === --- lx26-21-rc3-mm2.orig/include/linux/sched.h 2007-03-20 20:13:19.0 -0700 +++ lx26-21-rc3-mm2/include/linux/sched.h 2007-03-21

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-22 Thread Eric W. Biederman
Cedric Le Goater [EMAIL PROTECTED] writes: [ long long thread ] Eric W. Biederman wrote: Cedric Le Goater [EMAIL PROTECTED] writes: what about a kthread that would be spawned when a task is cloned in an unshared pid namespace ? This is an extra cost in term of tasks. If you use

Re: [Devel] Re: [PATCHSET] 2.6.20-lxc8

2007-03-22 Thread Eric W. Biederman
Denis V. Lunev [EMAIL PROTECTED] writes: Kirill Korotaev wrote: if network device inside container has MTU higher then eth0 outside the container, then packets will get fragmented. First time to MTU1 inside container and refragmented to MTU2 outside the container. At least this is the

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-22 Thread Eric W. Biederman
Cedric Le Goater [EMAIL PROTECTED] writes: So I suggested to have a kthread be pid == 1 for each new pid namespace. the kthread can do the killing of all tasks if needed and will die when the refcount on the pid namespace == 0. Would such a (rough) design be acceptable for mainline ? The

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-22 Thread Eric W. Biederman
Cedric Le Goater [EMAIL PROTECTED] writes: Back to the main subject I still don't understand the idea of running a kernel daemon as pid == 1. What would that buy us? mostly a child reaper when there are no /sbin/init but its pid cannot be 1. Yes we should be able to assign just about any

[Devel] Re: [PATCH 2/2] Replace pid_t in autofs with struct pid reference

2007-03-22 Thread Eric W. Biederman
Ian Kent [EMAIL PROTECTED] writes: On Wed, 2007-03-21 at 15:58 -0500, Serge E. Hallyn wrote: PS Note that if I'm right, but some machine starts autofs in a child pid_namespace, the pid_nr() the way I have it is wrong. I'm not sure in that case how we go about fixing that. Somehow we need

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-22 Thread Eric W. Biederman
Dave Hansen [EMAIL PROTECTED] writes: On Thu, 2007-03-22 at 09:33 -0500, Serge E. Hallyn wrote: I still prefer that we forego that kthread, and just work toward allowing pid1 to exit. Really I think the crufty /proc/pid handling is the only reason we were going to punt on that for now.

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-22 Thread Eric W. Biederman
Dave Hansen [EMAIL PROTECTED] writes: So, doesn't that problem go away (or at least move to be umount's duty) if we completely disconnect those inodes' lifetime from that of any process or pid namespace? If the last process has exited the pid namespace I would like the code to continue to

[Devel] Re: 2.6.20-lxc8 - compilation error

2007-03-22 Thread Eric W. Biederman
Rishikesh [EMAIL PROTECTED] writes: Hi, I am getting this error while compilation on x86 on ABAT. Job ID: 78220 You can find more detailed log here : http://abat.linux.ibm.com/abat-repo/logs/[EMAIL PROTECTED]/host:1/debug/test.log.0 If you are inside ibm's firewall. Otherwise

[Devel] Re: controlling mmap()'d vs read/write() pages

2007-03-23 Thread Eric W. Biederman
Nick Piggin [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Dave Hansen [EMAIL PROTECTED] writes: So, I think we have a difference of opinion. I think it's _all_ about memory pressure, and you think it is _not_ about accounting for memory pressure. :) Perhaps we mean different things

Re: [Devel] Re: [PATCHSET] 2.6.20-lxc8

2007-03-23 Thread Eric W. Biederman
Kirill Korotaev [EMAIL PROTECTED] writes: we have the hack below in ip_forward() to avoid skb_cow(), Banjamin, can you check whether it helps in your case please? (NOTE: you will need to replace check for NETIF_F_VENET with something else or introduce the same flag on etun device). Ugh.

[Devel] Re: controlling mmap()'d vs read/write() pages

2007-03-23 Thread Eric W. Biederman
Nick Piggin [EMAIL PROTECTED] writes: Would any of them work on a system on which every filesystem was on ramfs, and there was no swap? If not then they are not memory attacks but I/O attacks. I completely concede that you can DOS the system with I/O if that is not limited as well. My

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-26 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting Eric W. Biederman ([EMAIL PROTECTED]): Dave Hansen [EMAIL PROTECTED] writes: So, doesn't that problem go away (or at least move to be umount's duty) if we completely disconnect those inodes' lifetime from that of any process or pid

[Devel] Re: [RFC][PATCH] Do not set /proc inode-pid for non-pid-related inodes

2007-03-26 Thread Eric W. Biederman
Dave Hansen [EMAIL PROTECTED] writes: On Mon, 2007-03-26 at 11:12 -0600, Eric W. Biederman wrote: In (at least one version of) Dave's patches, the /proc your pidns is automatically used when you use /proc. In that case a /proc should just go away when the last task goes away, since

Re: [Devel] Re: [PATCHSET] 2.6.20-lxc8

2007-03-27 Thread Eric W. Biederman
Benjamin Thery [EMAIL PROTECTED] writes: Hi, Yesterday, I applied a patch similar to Kirill's one that skip skb_cow() in ip_forward when the device is a etun, and it does help a lot. With the patch the cpu load increase is reduced by 50%. Part of the problem is solved. Here are the

[Devel] Re: L2 network namespace benchmarking

2007-03-27 Thread Eric W. Biederman
Daniel Lezcano [EMAIL PROTECTED] writes: 3. General observations --- The objective to have no performances degrations, when the network namespace is off in the kernel, is reached in both solutions. When the network is used outside the container and the network

[Devel] Re: L2 network namespace benchmarking

2007-03-28 Thread Eric W. Biederman
Kirill Korotaev [EMAIL PROTECTED] writes: Ideally we can optimize the bridge code or something equivalent to it so that we can take one look at the destination mac address and know which network namespace we should be in. Potentially moving this work to hardware when the hardware supports

Re: [Devel] Re: [PATCHSET] 2.6.20-lxc8

2007-03-28 Thread Eric W. Biederman
Kirill Korotaev [EMAIL PROTECTED] writes: Benjamin, checksumming can be optimized out as well. We had an experimental patch for OpenVZ venet device, which adds NETIF_F_LLTX | NETIF_F_HW_CSUM | NETIF_F_SG | NETIF_F_HIGHDMA features to venet device and avoids additional checksumming where

[Devel] Screamm.. commit f400e198b2ed26ce55b22a1412ded0896e7516ac

2007-03-28 Thread Eric W. Biederman
This is just to vent. I was clearly not auditing patches well enough earlier and the above patch got modified since the version I wrote initially. Adding a few addition is_init calls where what we care about test is not is the real init process of the system (so we should treat it with care)

[Devel] Re: Screamm.. commit f400e198b2ed26ce55b22a1412ded0896e7516ac

2007-03-29 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes: Yup. Looks like ambiguous naming once again hid some real (future) bugs. This is of course safe so far in mainline, but needs to be split into static inline int is_global_init(struct task_struct *tsk) { return (tsk == init_task); } and

[Devel] Re: L2 network namespace benchmarking

2007-03-29 Thread Eric W. Biederman
Benjamin Thery [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Daniel Lezcano [EMAIL PROTECTED] writes: [...] * When do you expect to have the network namespace into mainline ? My current goal is to finish my rebase against 2.6.linus_lastest in the next couple of days after having

[Devel] Re: Screamm.. commit f400e198b2ed26ce55b22a1412ded0896e7516ac

2007-03-29 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes: Where the latter is needed in, for instance, kernel/capability.c. Yes. I think more clear cut examples could be made. It isn't clear to me why we skip pid == 1 in kernel/capability.c Because the capset(2) manpage says: For capset(),

[Devel] Re: L2 network namespace benchmarking (resend with Service Demand)

2007-03-30 Thread Eric W. Biederman
Daniel Lezcano [EMAIL PROTECTED] writes: Hi, as suggested Rick, I added the Service Demand results to the matrix. Thanks. The latency number is interesting and it confirms what we were seeing looking at cpu usage. We don't have an inexpesive way to get a packet from the outside world to a

[Devel] Grr sysfs networking changes...

2007-04-03 Thread Eric W. Biederman
I've almost got my netns patchset rebased against linus's latest tree. The sysfs changes were extensive and while I finally have something working with them. Every time I stop and think about my sysfs code I spot more issues that need to be resolved. With any luck I should have something I can

[Devel] Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem

2007-04-04 Thread Eric W. Biederman
Next time I have a moment I will try and take a closer look. However currently these approaches feel like there is some unholy coupling going on between different things. In addition there appear to be some weird assumptions (an array with one member per task_struct) in the group. The pid

[Devel] Re: L2 network namespace benchmarking (resend with Service Demand)

2007-04-06 Thread Eric W. Biederman
Daniel Lezcano [EMAIL PROTECTED] writes: Hi, as suggested Rick, I added the Service Demand results to the matrix. A couple of random thoughts in trying to understand the numbers you are seeing. - Checksum offloading? You have noted that with the bridge netfilter support disabled you are

[Devel] Re: L2 network namespace benchmarking (resend with Service Demand)

2007-04-06 Thread Eric W. Biederman
Benjamin Thery [EMAIL PROTECTED] writes: Eric W. Biederman wrote: A couple of random thoughts in trying to understand the numbers you are seeing. - Checksum offloading? You have noted that with the bridge netfilter support disabled you are still seeing additional checksum overhead

[Devel] [PATCH 0/5] On to usable sysfs shadow directory support...

2007-04-06 Thread Eric W. Biederman
The following patchset has been tested on 2.6.21-rc6 + Kay's driver-core-fix-namespace-issue-with-devices-assigned-to-classes.patch It has been tested both with CONFIG_SYSFS_DEPRECATED set and unset. Although more testing has been involved with CONFIG_SYSFS_DEPRECATED unset because that was the

[Devel] [PATCH 2/5] sysfs: Remove first pass at shadow directory support

2007-04-06 Thread Eric W. Biederman
b592fcfe7f06c15ec11774b5be7ce0de3aa86e73 is now gone. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- fs/sysfs/dir.c | 196 +-- fs/sysfs/group.c|1 - fs/sysfs/inode.c| 10 --- fs/sysfs/mount.c|2 +- fs/sysfs/sysfs.h

[Devel] [PATCH 3/5] sysfs: Implement sysfs manged shadow directory support.

2007-04-06 Thread Eric W. Biederman
-by: Eric W. Biederman [EMAIL PROTECTED] --- fs/sysfs/bin.c|2 +- fs/sysfs/dir.c| 370 ++--- fs/sysfs/file.c |4 +- fs/sysfs/group.c | 12 +- fs/sysfs/inode.c | 18 ++- fs/sysfs/symlink.c| 11 +- fs/sysfs/sysfs.h

[Devel] [PATCH 4/5] sysfs: Implement sysfs_delete_link and sysfs_rename_link

2007-04-06 Thread Eric W. Biederman
kobject is renamed or deleted. If they are called later I loose track of which tag the target kobject was marked with and can no longer find the old symlink to remove it. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- fs/sysfs/symlink.c| 31 +++ include

[Devel] [PATCH] net: Add etun driver

2007-04-06 Thread Eric W. Biederman
/newif To destroy a pair of devices: echo -n 'veth0' /sys/module/etun/parameters/delif Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- drivers/net/Kconfig | 14 ++ drivers/net/Makefile |1 + drivers/net/etun.c | 486 ++ 3 files

  1   2   3   4   5   6   7   8   9   10   >