Re: [PATCH v9 4/8] Reimplement RLIMIT_NPROC on top of ucounts

2021-04-07 Thread Eric W. Biederman
Alexey Gladkov  writes:

> On Mon, Apr 05, 2021 at 11:56:35AM -0500, Eric W. Biederman wrote:
>>
>> Also when setting ns->ucount_max[] in create_user_ns because one value
>> is signed and the other is unsigned.  Care should be taken so that
>> rlimit_infinity is translated into the largest positive value the
>> type can hold.
>
> You mean like that ?
>
> ns->ucount_max[UCOUNT_RLIMIT_NPROC] = rlimit(RLIMIT_NPROC) <= LONG_MAX ?
>   rlimit(RLIMIT_NPROC) : LONG_MAX;
> ns->ucount_max[UCOUNT_RLIMIT_MSGQUEUE] = rlimit(RLIMIT_MSGQUEUE) <= LONG_MAX ?
>   rlimit(RLIMIT_MSGQUEUE) : LONG_MAX;
> ns->ucount_max[UCOUNT_RLIMIT_SIGPENDING] = rlimit(RLIMIT_SIGPENDING) <= 
> LONG_MAX ?
>   rlimit(RLIMIT_SIGPENDING) : LONG_MAX;
> ns->ucount_max[UCOUNT_RLIMIT_MEMLOCK] = rlimit(RLIMIT_MEMLOCK) <= LONG_MAX ?
>   rlimit(RLIMIT_MEMLOCK) : LONG_MAX;

Yes.

I only got as far as:
if (rlimit(RLIMI_NNN) == RLIM_INFINITY) {
ns->ucount_max[UCOUNT_LIMIT_NNN] = LONG_MAX;
} else {
ns->ucount_max[UCOUNT_LMIT_NNN] = rlmit(RLIMIT_NNN);
}
But forcing everything about LONG_MAX to LONG_MAX actually looks better
in practice.  Especially as that is effectively RLIMIT_INFINITY anyway.

Eric


Re: [PATCH v9 4/8] Reimplement RLIMIT_NPROC on top of ucounts

2021-04-06 Thread Alexey Gladkov
On Mon, Apr 05, 2021 at 11:56:35AM -0500, Eric W. Biederman wrote:
>
> Also when setting ns->ucount_max[] in create_user_ns because one value
> is signed and the other is unsigned.  Care should be taken so that
> rlimit_infinity is translated into the largest positive value the
> type can hold.

You mean like that ?

ns->ucount_max[UCOUNT_RLIMIT_NPROC] = rlimit(RLIMIT_NPROC) <= LONG_MAX ?
rlimit(RLIMIT_NPROC) : LONG_MAX;
ns->ucount_max[UCOUNT_RLIMIT_MSGQUEUE] = rlimit(RLIMIT_MSGQUEUE) <= LONG_MAX ?
rlimit(RLIMIT_MSGQUEUE) : LONG_MAX;
ns->ucount_max[UCOUNT_RLIMIT_SIGPENDING] = rlimit(RLIMIT_SIGPENDING) <= 
LONG_MAX ?
rlimit(RLIMIT_SIGPENDING) : LONG_MAX;
ns->ucount_max[UCOUNT_RLIMIT_MEMLOCK] = rlimit(RLIMIT_MEMLOCK) <= LONG_MAX ?
rlimit(RLIMIT_MEMLOCK) : LONG_MAX;

-- 
Rgrds, legion



Re: [PATCH v9 4/8] Reimplement RLIMIT_NPROC on top of ucounts

2021-04-05 Thread Eric W. Biederman
Alexey Gladkov  writes:

> The rlimit counter is tied to uid in the user_namespace. This allows
> rlimit values to be specified in userns even if they are already
> globally exceeded by the user. However, the value of the previous
> user_namespaces cannot be exceeded.
>
> To illustrate the impact of rlimits, let's say there is a program that
> does not fork. Some service-A wants to run this program as user X in
> multiple containers. Since the program never fork the service wants to
> set RLIMIT_NPROC=1.
>
> service-A
>  \- program (uid=1000, container1, rlimit_nproc=1)
>  \- program (uid=1000, container2, rlimit_nproc=1)
>
> The service-A sets RLIMIT_NPROC=1 and runs the program in container1.
> When the service-A tries to run a program with RLIMIT_NPROC=1 in
> container2 it fails since user X already has one running process.
>
> We cannot use existing inc_ucounts / dec_ucounts because they do not
> allow us to exceed the maximum for the counter. Some rlimits can be
> overlimited by root or if the user has the appropriate capability.

In general this looks good.

My comments are going to be on the infrastructure this patch
introduces rather than the change to RLIMIT_NPROC.  Perhaps
these would fare better as separate patches.

To preserve the existing semantics of incrementing and decrementing
rlimits in places separate from where the limits are checked you
correctly introduce inc_rlimit_ucounts and dec_rlimit_ucounts.

This separation means that for a short period of time the values
in the counts can overflow.  Which is something that the current
ucount helpers don't allow.  Taking advantage of the entire
negative range for the counts, just like we take advantage
of the entire negative range in get_ucounts seems a reasonable thing
to do.

On 32bit using "atomic_long_t" instead of "unsigned long" to hold the
counts has the downside that RLIMIT_MSGQUEUE and RLIMIT_MEMLOCK are
limited to 2GiB instead of 4GiB.  I don't think anyone cares but
it should be mentioned in case someone does.

The functions inc_rlimit_ucounts and inc_rlimit_ucounts_test need
to return overflow in the unlikely event the count values becomes
negative.

Also when setting ns->ucount_max[] in create_user_ns because one value
is signed and the other is unsigned.  Care should be taken so that
rlimit_infinity is translated into the largest positive value the
type can hold.

Eric

> Signed-off-by: Alexey Gladkov 
> ---
>  fs/exec.c  |  2 +-
>  include/linux/cred.h   |  2 ++
>  include/linux/sched/user.h |  1 -
>  include/linux/user_namespace.h | 13 
>  kernel/cred.c  | 10 +++---
>  kernel/exit.c  |  2 +-
>  kernel/fork.c  |  9 ++---
>  kernel/sys.c   |  2 +-
>  kernel/ucount.c| 61 ++
>  kernel/user.c  |  1 -
>  kernel/user_namespace.c|  3 +-
>  11 files changed, 91 insertions(+), 15 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index d7c4187ca023..f2bcdbeb3afb 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1878,7 +1878,7 @@ static int do_execveat_common(int fd, struct filename 
> *filename,
>* whether NPROC limit is still exceeded.
>*/
>   if ((current->flags & PF_NPROC_EXCEEDED) &&
> - atomic_read(_user()->processes) > rlimit(RLIMIT_NPROC)) {
> + is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, 
> rlimit(RLIMIT_NPROC))) {
>   retval = -EAGAIN;
>   goto out_ret;
>   }
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 66436e655032..5ca1e8a1d035 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -372,6 +372,7 @@ static inline void put_cred(const struct cred *_cred)
>  
>  #define task_uid(task)   (task_cred_xxx((task), uid))
>  #define task_euid(task)  (task_cred_xxx((task), euid))
> +#define task_ucounts(task)   (task_cred_xxx((task), ucounts))
>  
>  #define current_cred_xxx(xxx)\
>  ({   \
> @@ -388,6 +389,7 @@ static inline void put_cred(const struct cred *_cred)
>  #define current_fsgid()  (current_cred_xxx(fsgid))
>  #define current_cap()(current_cred_xxx(cap_effective))
>  #define current_user()   (current_cred_xxx(user))
> +#define current_ucounts()(current_cred_xxx(ucounts))
>  
>  extern struct user_namespace init_user_ns;
>  #ifdef CONFIG_USER_NS
> diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
> index a8ec3b6093fc..d33d867ad6c1 100644
> --- a/include/linux/sched/user.h
> +++ b/include/linux/sched/user.h
> @@ -12,7 +12,6 @@
>   */
>  struct user_struct {
>   refcount_t __count; /* reference count */
> - atomic_t processes; /* How many processes does this user have? */
>   atomic_t sigpending;/* How many pending signals does this user 
> have? */
>  #ifdef CONFIG_FANOTIFY