"Fabio M. De Francesco" writes:
> The use of kmap() and kmap_atomic() are being deprecated in favor of
> kmap_local_page().
>
> With kmap_local_page(), the mappings are per thread, CPU local and not
> globally visible. Furthermore, the mappings can be acquired from any
> context (including
Guiseppe can you take a look at this?
This is a second attempt at tightening up the semantics of writing to
file capabilities from a user namespace.
The first attempt was reverted with 3b0c2d3eaa83 ("Revert 95ebabde382c
("capabilities: Don't allow writing ambiguous v3 file capabilities")"),
Linus Torvalds writes:
> On Thu, Apr 8, 2021 at 1:32 AM kernel test robot
> wrote:
>>
>> FYI, we noticed a -41.9% regression of stress-ng.sigsegv.ops_per_sec due to
>> commit
>> 08ed4efad684 ("[PATCH v10 6/9] Reimplement RLIMIT_SIGPENDING on top of
>> ucounts")
>
> Ouch.
We were cautiously
Alexey Gladkov writes:
> On Mon, Apr 05, 2021 at 11:56:35AM -0500, Eric W. Biederman wrote:
>>
>> Also when setting ns->ucount_max[] in create_user_ns because one value
>> is signed and the other is unsigned. Care should be taken so that
>> rlimit_infinity
er
> but in this case we face a different problem of uid mapping when transferring
> files from one container to another.
>
> Eric W. Biederman mentioned this issue [2][3].
>
> Introduced changes
> --
> To address the problem, we bind rlimit counters to user nam
Alexey Gladkov writes:
> The current implementation of the ucounts reference counter requires the
> use of spin_lock. We're going to use get_ucounts() in more performance
> critical areas like a handling of RLIMIT_SIGPENDING.
>
> Now we need to use spin_lock only if we want to change the
A small bug below.
Eric
> diff --git a/kernel/signal.c b/kernel/signal.c
> index f2a1b898da29..1b537d9de447 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -413,49 +413,44 @@ void task_join_group_stop(struct task_struct *task)
> static struct sigqueue *
> __sigqueue_alloc(int
Alexey Gladkov writes:
> The rlimit counter is tied to uid in the user_namespace. This allows
> rlimit values to be specified in userns even if they are already
> globally exceeded by the user. However, the value of the previous
> user_namespaces cannot be exceeded.
>
> To illustrate the impact
Kees Cook writes:
> On Wed, Mar 31, 2021 at 11:36:28PM -0500, Eric W. Biederman wrote:
>> Josh Hunt writes:
>>
>> > Currently only root can write files under /proc/pressure. Relax this to
>> > allow tasks running as unprivileged users with CAP_SYS_
Josh Hunt writes:
> Currently only root can write files under /proc/pressure. Relax this to
> allow tasks running as unprivileged users with CAP_SYS_RESOURCE to be
> able to write to these files.
The test for CAP_SYS_RESOURCE really needs to be in open rather
than in write.
Otherwise a suid
effects observed. Kicked off the longer runs now.
>
> Not a huge amount of changes from the posted series, but please peruse
> here if you want to double check:
>
> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-5.12
>
> And diff against v2 posted is below. Thanks!
Jens Axboe writes:
> On 3/26/21 4:23 PM, Eric W. Biederman wrote:
>> Jens Axboe writes:
>>
>>> On 3/26/21 2:29 PM, Eric W. Biederman wrote:
>>>> Jens Axboe writes:
>>>>
>>>>> We go through various hoops to disallow signals for t
Jens Axboe writes:
> On 3/26/21 2:29 PM, Eric W. Biederman wrote:
>> Jens Axboe writes:
>>
>>> We go through various hoops to disallow signals for the IO threads, but
>>> there's really no reason why we cannot just allow them. The IO threads
>>> neve
Christoph Hellwig writes:
> diff --git a/fs/exec.c b/fs/exec.c
> index 06e07278b456fa..b34c1eb9e7ad8e 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -391,47 +391,34 @@ static int bprm_mm_init(struct linux_binprm *bprm)
> return err;
> }
>
> -struct user_arg_ptr {
> -#ifdef
Jens Axboe writes:
> This is racy - move the blocking into when the task is created and
> we're marking it as PF_IO_WORKER anyway. The IO threads are now
> prepared to handle signals like SIGSTOP as well, so clear that from
> the mask to allow proper stopping of IO threads.
Acked
Jens Axboe writes:
> Right now we're never calling get_signal() from PF_IO_WORKER threads, but
> in preparation for doing so, don't handle a fatal signal for them. The
> workers have state they need to cleanup when exiting, and they don't do
> coredumps, so just return instead of performing
Jens Axboe writes:
> We go through various hoops to disallow signals for the IO threads, but
> there's really no reason why we cannot just allow them. The IO threads
> never return to userspace like a normal thread, and hence don't go through
> normal signal processing. Instead, just check for a
Oleg Nesterov writes:
> On 03/25, Linus Torvalds wrote:
>>
>> The whole "signals are very special for IO threads" thing has caused
>> so many problems, that maybe the solution is simply to _not_ make them
>> special?
>
> Or may be IO threads should not abuse CLONE_THREAD?
>
> Why does
Oleg Nesterov writes:
> On 03/25, Eric W. Biederman wrote:
>>
>> So looking quickly the flip side of the coin is gdb (and other
>> debuggers) needs a way to know these threads are special, so it can know
>> not to attach.
>
> may be,
>
>> I suspect get
Linus Torvalds writes:
> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
> wrote:
>>
>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>> wrote:
>> >
>> > I don't know what the gdb logic is, but maybe there's some other
>> > option that makes gdb not react to them?
>>
>> .. maybe we could
Jens Axboe writes:
> On 3/25/21 1:42 PM, Linus Torvalds wrote:
>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>> wrote:
>>>
>>> I don't know what the gdb logic is, but maybe there's some other
>>> option that makes gdb not react to them?
>>
>> .. maybe we could have a different name for
Jens Axboe writes:
> Hi,
>
> Stefan reports that attaching to a task with io_uring will leave gdb
> very confused and just repeatedly attempting to attach to the IO threads,
> even though it receives an -EPERM every time. This patchset proposes to
> skip PF_IO_WORKER threads as
Stefan Metzmacher writes:
> Am 25.03.21 um 12:24 schrieb Sasha Levin:
>> From: "Eric W. Biederman"
>>
>> [ Upstream commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597 ]
>>
>> Just like we don't allow normal signals to IO threads, don't deliver a
Jens Axboe writes:
> On 3/20/21 4:08 PM, Eric W. Biederman wrote:
>>
>> Added criu because I just realized that io_uring (which can open files
>> from an io worker thread) looks to require some special handling for
>> stopping and freezing processes.
Jens Axboe writes:
> On 3/20/21 3:38 PM, Eric W. Biederman wrote:
>> Linus Torvalds writes:
>>
>>> On Sat, Mar 20, 2021 at 9:19 AM Eric W. Biederman
>>> wrote:
>>>>
>>>> The creds should be reasonably in-sync with the rest of
Added criu because I just realized that io_uring (which can open files
from an io worker thread) looks to require some special handling for
stopping and freezing processes. If not in the SIGSTOP case in the
related cgroup freezer case.
Linus Torvalds writes:
> On Sat, Mar 20, 2021 at 10:51
Linus Torvalds writes:
> On Sat, Mar 20, 2021 at 9:19 AM Eric W. Biederman
> wrote:
>>
>> The creds should be reasonably in-sync with the rest of the threads.
>
> It's not about credentials (despite the -EPERM).
>
> It's about the fact that kernel threads cannot
Jens Axboe writes:
> Hi,
>
> Been trying to ensure that we do the right thing wrt signals and
> PF_IO_WORKER threads, and I think there are two cases we need to handle
> explicitly:
>
> 1) Just don't allow signals to them in general. We do mask everything
>as blocked, outside of SIGKILL, so
Jens Axboe writes:
> Just like we don't allow normal signals to IO threads, don't deliver a
> STOP to a task that has PF_IO_WORKER set. The IO threads don't take
> signals in general, and have no means of flushing out a stop either.
At first glance this seems safe. This is before we count all
Jens Axboe writes:
> They don't take signals individually, and even if they share signals with
> the parent task, don't allow them to be delivered through the worker
> thread.
This is silly I know, but why do we care?
The creds should be reasonably in-sync with the rest of the threads.
There
Oleg Nesterov writes:
> On 03/18, qianli zhao wrote:
>>
>> Hi,Oleg
>>
>> Thank you for your reply.
>>
>> >> When init sub-threads running on different CPUs exit at the same time,
>> >> zap_pid_ns_processe()->BUG() may be happened.
>>
>> > and why do you think your patch can't prevent this?
>>
>>
Linus Torvalds writes:
> On Fri, Mar 12, 2021 at 1:34 PM Eric W. Biederman
> wrote:
>>
>> Please pull the for-v5.12-rc3 branch from the git tree.
>>
>> Removing the ambiguity broke userspace so please revert the change:
>> It turns out that there a
Jim Newsome writes:
> On 3/12/21 14:29, Eric W. Biederman wrote:
>> When I looked at this a second time it became apparent that using
>> pid_task twice should actually be faster as it removes a dependent load
>> caused by thread_group_leader, and replaces it by accessing two
Jim Newsome writes:
> do_wait is an internal function used to implement waitpid, waitid,
> wait4, etc. To handle the general case, it does an O(n) linear scan of
> the thread group's children and tracees.
>
> This patch adds a special-case when waiting on a pid to skip these scans
> and instead
Qianli Zhao writes:
> From: Qianli Zhao
>
> When init sub-threads running on different CPUs exit at the same time,
> zap_pid_ns_processe()->BUG() may be happened.
> And every thread status is abnormal after exit(PF_EXITING set,task->mm=NULL
> etc),
> which makes it difficult to parse coredump
Thomas Gleixner writes:
> This is a follow up to the initial submission which can be found here:
>
> https://lore.kernel.org/r/20210303142025.wbbt2nnr6dtgw...@linutronix.de
>
> Signal sending requires a kmem cache allocation at the sender side and the
> receiver hands it back to the kmem cache
Oleg Nesterov writes:
> On 03/10, Eric W. Biederman wrote:
>>
>> Jim Newsome writes:
>>
>> > +static int do_wait_pid(struct wait_opts *wo)
>> > +{
>> > + struct task_s
Thomas Gleixner writes:
> On Wed, Mar 10 2021 at 15:57, Eric W. Biederman wrote:
>> Thomas Gleixner writes:
>>> IMO, not bothering with an extra counter and rlimit plus the required
>>> atomic operations is just fine and having this for all tasks
>>> un
Jim Newsome writes:
> On 3/10/21 16:40, Eric W. Biederman wrote:
>>> +// Optimization for waiting on PIDTYPE_PID. No need to iterate
> through child
>>> +// and tracee lists to find the target task.
>>
>> Minor nit: C++ style comments look very out of pla
Jim Newsome writes:
> do_wait is an internal function used to implement waitpid, waitid,
> wait4, etc. To handle the general case, it does an O(n) linear scan of
> the thread group's children and tracees.
>
> This patch adds a special-case when waiting on a pid to skip these scans
> and instead
Oleg Nesterov writes:
> On 03/10, Eric W. Biederman wrote:
>>
>> /* If global init has exited,
>> * panic immediately to get a useable coredump.
>> */
>> if (unlikely(is_global_init(tsk) &&
>> (thread_group_e
Thomas Gleixner writes:
> On Thu, Mar 04 2021 at 21:58, Thomas Gleixner wrote:
>> On Thu, Mar 04 2021 at 13:04, Eric W. Biederman wrote:
>>> Thomas Gleixner writes:
>>>>
>>>> We could of course do the caching unconditionally for all tasks.
&g
atch is a follow-up of a previous one sent by Andy Lutomirski, but
> with less limitations:
> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.l...@amacapital.net/
>
> Cc: Al Viro
> Cc: Andy Lutomirski
> Cc: Christian Brauner
> Cc: Christo
Oleg Nesterov writes:
> On 03/10, Eric W. Biederman wrote:
>>
>> /* If global init has exited,
>> * panic immediately to get a useable coredump.
>> */
>> if (unlikely(is_global_init(tsk) &&
>> (thread_group_e
Filippo Sironi writes:
> We've seen a number of crashes with the following signature:
>
> BUG: kernel NULL pointer dereference, address:
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x) - not-present page
> ...
> Oops: [#1] SMP PTI
Mickaël Salaün writes:
> From: Mickaël Salaün
>
> Being able to easily change root directories enable to ease some
> development workflow and can be used as a tool to strengthen
> unprivileged security sandboxes. chroot(2) is not an access-control
> mechanism per se, but it can be used to
qianli zhao writes:
> Hi,Oleg
>
> Thanks for your replay.
>
>> To be honest, I don't understand the changelog. It seems that you want
>> to uglify the kernel to simplify the debugging of buggy init? Or what?
>
> My patch is for the following purpose:
> 1. I hope to fix the occurrence of
Al Viro writes:
> On Tue, Mar 09, 2021 at 11:29:14AM +0530, Palash Oswal wrote:
>
>> I observe the following result(notice the segfault in systemd):
>> root@sandbox:~# ./repro
>> [9.457767] got to 221
>> [9.457791] got to 183
>> [9.459144] got to 201
>> [9.459471] got to 208
>> [
Alexey Gladkov writes:
> On Wed, Feb 24, 2021 at 12:50:21PM -0600, Eric W. Biederman wrote:
>> Alexey Gladkov writes:
>>
>> > On Wed, Feb 24, 2021 at 10:54:17AM -0600, Eric W. Biederman wrote:
>> >> kernel test robot writes:
>> >>
>>
Thomas Gleixner writes:
> On Thu, Mar 04 2021 at 09:11, Sebastian Andrzej Siewior wrote:
>> On 2021-03-03 16:09:05 [-0600], Eric W. Biederman wrote:
>>> Sebastian Andrzej Siewior writes:
>>>
>>> > From: Thomas Gleixner
>>> >
>>>
Sebastian Andrzej Siewior writes:
> On 2021-03-03 16:09:05 [-0600], Eric W. Biederman wrote:
>> Sebastian Andrzej Siewior writes:
>>
>> > From: Thomas Gleixner
>> >
>> > Allow realtime tasks to cache one sigqueue in task struct. This avoids an
>
Christian Brauner writes:
> Hi Linus,
> This series comes with an extensive xfstests suite covering both ext4 and xfs
> https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts
> It covers truncation, creation, opening, xattrs, vfscaps, setid execution,
> setgid inheritance and more both
Sebastian Andrzej Siewior writes:
> From: Thomas Gleixner
>
> Allow realtime tasks to cache one sigqueue in task struct. This avoids an
> allocation which can increase the latency or fail.
> Ideally the sigqueue is cached after first successful delivery and will be
> available for next signal
Ilya Lipnitskiy writes:
> On Wed, Mar 3, 2021 at 7:50 AM Eric W. Biederman
> wrote:
>>
>> Ilya Lipnitskiy writes:
>>
>> > On Tue, Mar 2, 2021 at 11:37 AM Eric W. Biederman
>> > wrote:
>> >>
>> >> Ilya Lipnitskiy write
Ilya Lipnitskiy writes:
> On Tue, Mar 2, 2021 at 11:37 AM Eric W. Biederman
> wrote:
>>
>> Ilya Lipnitskiy writes:
>>
>> > On Mon, Mar 1, 2021 at 12:43 PM Eric W. Biederman
>> > wrote:
>> >>
>> >> Ilya Lipnitskiy writes:
&g
Ilya Lipnitskiy writes:
> On Mon, Mar 1, 2021 at 12:43 PM Eric W. Biederman
> wrote:
>>
>> Ilya Lipnitskiy writes:
>>
>> > Eric, All,
>> >
>> > The following error appears when running Linux 5.10.18 on an embedded
>> > MIPS mt76
Ilya Lipnitskiy writes:
> Eric, All,
>
> The following error appears when running Linux 5.10.18 on an embedded
> MIPS mt7621 target:
> [0.301219] BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1
>
> Being a very generic error, I started digging and added a stack dump
> before
chenzhou writes:
> On 2021/2/25 15:25, Baoquan He wrote:
>> On 02/24/21 at 02:19pm, Catalin Marinas wrote:
>>> On Sat, Jan 30, 2021 at 03:10:15PM +0800, Chen Zhou wrote:
Move CRASH_ALIGN to header asm/kexec.h for later use. Besides, the
alignment of crash kernel regions in x86 is
Alexey Gladkov writes:
> On Wed, Feb 24, 2021 at 10:54:17AM -0600, Eric W. Biederman wrote:
>> kernel test robot writes:
>>
>> > Greeting,
>> >
>> > FYI, we noticed a -82.7% regression of stress-ng.sigsegv.ops_per_sec
kernel test robot writes:
> Greeting,
>
> FYI, we noticed a -82.7% regression of stress-ng.sigsegv.ops_per_sec due to
> commit:
>
>
> commit: d28296d2484fa11e94dff65e93eb25802a443d47 ("[PATCH v7 5/7] Reimplement
> RLIMIT_SIGPENDING on top of ucounts")
> url:
>
Linus Torvalds writes:
> On Mon, Feb 15, 2021 at 4:42 AM Alexey Gladkov
> wrote:
>>
>> These patches are for binding the rlimit counters to a user in user
>> namespace.
>
> So this is now version 6, but I think the kernel test robot keeps
> complaining about them causing KASAN issues.
>
> The
Will Deacon writes:
> On Fri, 19 Feb 2021 14:51:42 -0500, Pavel Tatashin wrote:
>> machine_kexec_post_load() is called after kexec load is finished. It must
>> declared in public header not in kexec_internal.h
>>
>> Fixes the following compiler warning:
>>
>>
CLONE_NEWPID | CLONE_NEWUSER) < 0)
> err(1, "unshare");
> test_fork();
> return 0;
> }
> EOF
> $ sh -c ./a.out
> current: 10001, parent: 1, fork returned: 10002
> current: 10002, parent: 10001, fork returned: 10001
> cu
Alexey Gladkov writes:
> If only the dynamic part of procfs is mounted (subset=pid), then there is no
> need to check if procfs is fully visible to the user in the new user
> namespace.
A couple of things.
1) Allowing the mount should come in the last patch. So we don't have a
bisect hazard.
urrent->nsproxy->pid_ns_for_children
instead of task_active_pid_ns(p).
For sparc people. Do we know of anyone who actually uses the parent pid
returned from fork to the child process? If not the code can simply
return 0 and we don't have to do this.
Eric
> Cc: Eric W. Biederman
> Cc: s
Matthew Wilcox writes:
> On Fri, Feb 12, 2021 at 04:01:48PM -0600, Eric W. Biederman wrote:
>> Joe Perches writes:
>>
>> > Convert S_ permissions to the more readable octal.
>> >
>> > Done using:
>> > $ ./scripts/checkpatch.pl -f --fix
Joe Perches writes:
> On Fri, 2021-02-12 at 16:01 -0600, Eric W. Biederman wrote:
>> Joe Perches writes:
>>
>> > Convert S_ permissions to the more readable octal.
>> >
>> > Done using:
>> > $ ./scripts/checkpatch.pl -f --fix
Joe Perches writes:
> Convert S_ permissions to the more readable octal.
>
> Done using:
> $ ./scripts/checkpatch.pl -f --fix-inplace --types=SYMBOLIC_PERMS
> fs/proc/*.[ch]
>
> No difference in generated .o files allyesconfig x86-64
>
> Link:
>
Pavel Tatashin writes:
>> > I understand that having an extra set of page tables could potentially
>> > waste memory, especially if VAs are sparse, but in this case we use
>> > page tables exclusively for contiguous VA space (copy [src, src +
>> > size]). Therefore, the extra memory usage is
Pavel Tatashin writes:
> Hi James,
>
>> The problem I see with this is rewriting the relocation code. It needs to
>> work whether the
>> machine has enough memory to enable the MMU during kexec, or not.
>>
>> In off-list mail to Pavel I proposed an alternative implementation here:
>>
"Serge E. Hallyn" writes:
> On Fri, Jan 29, 2021 at 04:55:29PM -0600, Eric W. Biederman wrote:
>> "Serge E. Hallyn" writes:
>>
>> > On Thu, Jan 28, 2021 at 02:19:13PM -0600, Eric W. Biederman wrote:
>> >> "Serge E. Hallyn" wri
"Serge E. Hallyn" writes:
> On Thu, Jan 28, 2021 at 08:44:26PM +0100, Miklos Szeredi wrote:
>> On Thu, Jan 28, 2021 at 6:09 PM Serge E. Hallyn wrote:
>> >
>> > On Tue, Jan 19, 2021 at 07:34:49PM -0600, Eric W. Biederman
"Serge E. Hallyn" writes:
> On Thu, Jan 28, 2021 at 02:19:13PM -0600, Eric W. Biederman wrote:
>> "Serge E. Hallyn" writes:
>>
>> > On Tue, Jan 19, 2021 at 07:34:49PM -0600, Eric W. Biederman wrote:
>> >> Miklos Szeredi writes:
>&
Miklos Szeredi writes:
> On Thu, Jan 28, 2021 at 9:24 PM Eric W. Biederman
> wrote:
>
>>
>> From our previous discussions I would also argue it would be good
>> if there was a bypass that skipped all conversions if the reader
>> and the filesyst
"Serge E. Hallyn" writes:
> On Tue, Jan 19, 2021 at 07:34:49PM -0600, Eric W. Biederman wrote:
>> Miklos Szeredi writes:
>>
>> > If a capability is stored on disk in v2 format cap_inode_getsecurity() will
>> > currently return in v2 format uncon
Pavel Tatashin writes:
> kmsg_dump(KMSG_DUMP_SHUTDOWN) is called before
> machine_restart(), machine_halt(), machine_power_off(), the only one that
> is missing is machine_kexec().
>
> The dmesg output that it contains can be used to study the shutdown
> performance of both kernel and systemd
Alexey Gladkov writes:
> On Tue, Jan 19, 2021 at 07:57:36PM -0600, Eric W. Biederman wrote:
>> Alexey Gladkov writes:
>>
>> > On Mon, Jan 18, 2021 at 12:34:29PM -0800, Linus Torvalds wrote:
>> >> On Mon, Jan 18, 2021 at 11:46 AM Alexey Gladkov
>> >
by having no_new_privs enforce
progressinvely tighter permissions.
Fixes: 9fcf78cca198 ("apparmor: update domain transitions that are subsets of
confinement at nnp")
Signed-off-by: Eric W. Biederman
---
I came accross this while examining the places cred_guard_mutex is
used and trying to
TL;DR selinux and apparmor ignore no_new_privs
What?
John Johansen writes:
> On 1/20/21 1:26 PM, Eric W. Biederman wrote:
>>
>> The current understanding of apparmor with respect to no_new_privs is at
>> odds with how no_new_privs is implemented and u
This should now Cc the correct email address for James Morris.
ebied...@xmission.com (Eric W. Biederman) writes:
> The current understanding of apparmor with respect to no_new_privs is at
> odds with how no_new_privs is implemented and understood by the rest of
> t
ebied...@xmission.com (Eric W. Biederman) writes:
> Alexey Gladkov writes:
>
>> On Mon, Jan 18, 2021 at 12:34:29PM -0800, Linus Torvalds wrote:
>>> On Mon, Jan 18, 2021 at 11:46 AM Alexey Gladkov
>>> wrote:
>>> >
>>> > Sorry about that.
Alexey Gladkov writes:
> On Mon, Jan 18, 2021 at 12:34:29PM -0800, Linus Torvalds wrote:
>> On Mon, Jan 18, 2021 at 11:46 AM Alexey Gladkov
>> wrote:
>> >
>> > Sorry about that. I thought that this code is not needed when switching
>> > from int to refcount_t. I was wrong.
>>
>> Well, you
well this works with stacking. In particular
ovl_xattr_set appears to call vfs_getxattr without overriding the creds.
What the purpose of that is I haven't quite figured out. It looks like
it is just a probe to see if an xattr is present so maybe it is ok.
Acked-by: "Eric W. Biederman&q
Miklos Szeredi writes:
> It turns out overlayfs is actually okay wrt. mutliple conversions, because
> it uses the right context for lower operations. I.e. before calling
> vfs_{set,get}xattr() on underlying fs, it overrides creds with that of the
> mounter, so the current user ns will now match
legated_inode and breaking
leases. Code that is enabled with CONFIG_FILE_LOCKING. So unless
I am missing something this introduces a different regression into
ecryptfs.
>
> Reported-by: Eric W. Biederman
> Cc: Tyler Hicks
> Fixes: 7c03e2cda4a5 ("vfs: move cap_convert_nscap() c
Alexey Gladkov writes:
We might want to use refcount_t instead of atomic_t. Not a big deal
either way.
> Signed-off-by: Alexey Gladkov
> ---
> include/linux/user_namespace.h | 2 +-
> kernel/ucount.c| 10 +-
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff
The subject is wrong. This should be:
[RFC PATCH v2 2/8] Add a reference to ucounts for each cred.
Further the explanation could use a little work. Something along the
lines of:
For RLIMIT_NPROC and some other rlimits the user_struct that holds the
global limit is kept alive for the lifetime
ebied...@xmission.com (Eric W. Biederman) writes:
> So there is the basic question do we want to read the raw bytes on disk
> or do we want to return something meaningful to the reader. As the
> existing tools use the xattr interface to set/clear fscaps returning
> data to user space
Miklos Szeredi writes:
> On Tue, Jan 12, 2021 at 1:15 AM Eric W. Biederman
> wrote:
>>
>> Miklos Szeredi writes:
>>
>> > On Fri, Jan 01, 2021 at 11:35:16AM -0600, Eric W. Biederman wrote:
>
>> > For one: a v2 fscap is supposed to be equivalent t
Miklos Szeredi writes:
> On Fri, Jan 01, 2021 at 11:35:16AM -0600, Eric W. Biederman wrote:
>> Miklos Szeredi writes:
>>
>> > cap_convert_nscap() does permission checking as well as conversion of the
>> > xattr value conditionally based on fs's user-ns.
>
Linus Torvalds writes:
> On Sun, Jan 10, 2021 at 9:34 AM Alexey Gladkov
> wrote:
>>
>> To address the problem, we bind rlimit counters to each user namespace. The
>> result is a tree of rlimit counters with the biggest value at the root (aka
>> init_user_ns). The rlimit counter
Andy Lutomirski writes:
> The implementation was rather buggy. It unconditionally marked PTEs
> read-only, even for VM_SHARED mappings. I'm not sure whether this is
> actually a problem, but it certainly seems unwise. More importantly, it
> released the mmap lock before flushing the TLB,
Al Viro writes:
> On Mon, Jan 04, 2021 at 06:47:38PM -0600, Eric W. Biederman wrote:
>> >> It is defined in the Ubuntu kernel configs I've got lurking:
>> >> Both 3.8.0-19_generic (Ubuntu 13.04) and 5.4.0-56_generic (probably
>> >> 20.04).
>> >
Andy Lutomirski writes:
>> On Jan 4, 2021, at 2:36 PM, David Laight wrote:
>>
>> From: Eric W. Biederman
>>> Sent: 04 January 2021 20:41
>>>
>>> Al Viro writes:
>>>
>>>> On Mon, Jan 04, 2021 at 12:16:56PM +000
Al Viro writes:
> On Mon, Jan 04, 2021 at 12:16:56PM +, David Laight wrote:
>> On x86 in_compat_syscall() is defined as:
>> in_ia32_syscall() || in_x32_syscall()
>>
>> Now in_ia32_syscall() is a simple check of the TS_COMPAT flag.
>> However in_x32_syscall() is a horrid beast that has
Miklos Szeredi writes:
> cap_convert_nscap() does permission checking as well as conversion of the
> xattr value conditionally based on fs's user-ns.
>
> This is needed by overlayfs and probably other layered fs (ecryptfs) and is
> what vfs_foo() is supposed to do anyway.
Well crap.
I just
Tetsuo Handa writes:
> Commit db68ce10c4f0a27c ("new helper: uaccess_kernel()") replaced
> segment_eq(get_fs(), KERNEL_DS)
> with uaccess_kernel(). But uaccess_kernel() became an unconditional "false"
> for some architectures
> due to commit 5e6e9852d6f76e01 ("uaccess: add infrastructure for
Oleg Nesterov writes:
> On 12/17, Eric W. Biederman wrote:
>>
>> Oleg Nesterov writes:
>>
>> > Suppose we have 2 threads, the group-leader L and a sub-theread T,
>> > both parked in ptrace_stop(). Debugger tries to resume both threads
>>
ed-by: "Eric W. Biederman"
>
> Suggested-by: Jens Axboe
> Signed-off-by: Casey Schaufler
> ---
> security/smack/smack_access.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/security/smack/smack_access.c b/security/smack/sma
Leesoo Ahn writes:
> clear_siginfo() is responsible for clearing struct kernel_siginfo object.
> It's obvious that manually initializing those fields is needless as
> a commit[1] explains why the function introduced and its guarantee that
> all bits in the struct are cleared after it.
The
1 - 100 of 10702 matches
Mail list logo