Launchpad has imported 7 comments from the remote bug at https://bugzilla.kernel.org/show_bug.cgi?id=195453.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2017-04-18T13:22:33+00:00 colin.king wrote: There is a race condition on the fs->users counter and fs->in_exec flag that impacts exec() on a suid program when there are multiple threads being created and destroyed by a parent process. This in_exec flag was introduced way back in 2009 with commit: commit 498052bba55ecaff58db6a1436b0e25bfd75a7ff Author: Al Viro <v...@zeniv.linux.org.uk> Date: Mon Mar 30 07:20:30 2009 -0400 When performing an exec() on a suid program, check_unsafe_exec() checks to see we have more f->fs->users than the count of child threads that share the same fs with the parent. However, there are two race conditions that this seems to fail on: 1. The parent may be creating a new thread and copy_fs in kernel/fork.c bumps the fs->users count before the thread is attached to the parent hence causing the check p->fs->users > n_fs in check_unsafe_exec() to be true in check_unsafe_exec() and cause the execution a the suid program by the parent to fail to marked as unsafe and so it executes as a non- suid executable. The crux of the matter is that the fs->user count temporarily ahead by 1 with the number of threads linked to the process and we don't have any locking mechanism to protect us from this in the exec phase. 2. A thread my be dying and the associated fs pointer is nullified before it is removed from the parent and this also causes issues in the check_unsafe_exec() fs sanity check. I believe this can be fixed checking for t->fs being NULL, e.g.: diff --git a/fs/exec.c b/fs/exec.c index 72934df68471..ebfd9b76b69f 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1447,7 +1447,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm) spin_lock(&p->fs->lock); rcu_read_lock(); while_each_thread(p, t) { - if (t->fs == p->fs) + if (t->fs == p->fs || !t->fs) n_fs++; } rcu_read_unlock(); The reproducer is quite simple and always easy to reproduce: $ cat Makefile ALL=a b all: $(ALL) a: LDFLAGS=-pthread b: b.c $(CC) b.c -o b sudo chown root:root b sudo chmod u+s b test: for I in $$(seq 1000); do echo $I; ./a ; done clean: rm -vf $(ALL) $ cat a.c #include <unistd.h> #include <stdio.h> #include <pthread.h> #include <time.h> void *nothing(void *p) { return NULL; } void *target(void *p) { for (;;) { pthread_t t; if (pthread_create(&t, NULL, nothing, NULL) == 0) pthread_join(t, NULL); } return NULL; } int main(void) { struct timespec tv; int i; for (i = 0; i < 10; i++) { pthread_t t; pthread_create(&t, NULL, target, NULL); } tv.tv_sec = 0; tv.tv_nsec = 100000; nanosleep(&tv, NULL); if (execl("./b", "./b", NULL) < 0) perror("execl"); return 0; } $ cat b.c #include <unistd.h> #include <stdio.h> #include <sys/types.h> int main(void) { const uid_t euid = geteuid(); if (euid != 0) { printf("Failed, got euid %d (expecting 0)\n", euid); return 1; } return 0; } To reproduce: make make test You will see "Failed, got euid 1000 (expecting 0)" errors whenever the suid program being exec'd fails to exec as a suid process because of the race and failure in check_unsafe_exec() Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672819/comments/12 ------------------------------------------------------------------------ On 2017-04-21T00:36:08+00:00 dev.jongy wrote: I don't think the second condition you gave is relevant, because once the task_struct->fs pointer is nullified, this thread is not accounted in fs->users anymore. Since this thread doesn't count for neither fs->count nor n_fs, it is okay. The first condition is indeed a problem. I'm not sure of the wanted fix (probably some kind of locking in copy_process, that while_each_thread could use too), but currently this race is bad. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672819/comments/14 ------------------------------------------------------------------------ On 2017-04-21T09:47:07+00:00 colin.king wrote: Thanks for looking at this. The locking on copy_process concerns me as I didn't want to introduce a locking fix that caused a serious performance regression on the copy and exec paths. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672819/comments/15 ------------------------------------------------------------------------ On 2017-05-03T10:55:47+00:00 colin.king wrote: Any updates on this? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672819/comments/16 ------------------------------------------------------------------------ On 2017-05-10T11:09:09+00:00 colin.king wrote: Created attachment 256351 this fix solves the issue without any overhead of extra per thread locking and a simple lightweight retry This has been extensively tested on a multi-proc Xeon server with the reproducers and fixes the issue. I've checked this out with a high contention of pthreads and the retry loop occurs very rarely, so the overhead of the retry is very small indeed. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672819/comments/22 ------------------------------------------------------------------------ On 2021-04-16T01:24:27+00:00 benh wrote: Did you ever submit the fix to the mailing list ? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672819/comments/40 ------------------------------------------------------------------------ On 2022-07-13T01:39:01+00:00 shaoyi wrote: Hi Colin, May I please ask about the most recent update about your fix? This race condition can still be reproduced on the current linux mainline v5.19.0-rc6. I found ubuntu had picked this patch up as https://lists.ubuntu.com/archives/kernel-team/2017-May/084102.html but also reported the soft lockup issue in kernel 4.4 as https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1876856 so I'm wondering do you have an updated version of the patch and the plan to submit it to the upstream? Would really appreciate if you're still tracking this issue! Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672819/comments/41 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672819 Title: exec'ing a setuid binary from a threaded program sometimes fails to setuid Status in Linux: Confirmed Status in golang-1.6 package in Ubuntu: Invalid Status in linux package in Ubuntu: Fix Released Status in golang-1.6 source package in Xenial: Fix Released Status in linux source package in Xenial: Fix Released Status in golang-1.6 source package in Yakkety: Invalid Status in linux source package in Yakkety: Fix Released Status in golang-1.6 source package in Zesty: Invalid Status in linux source package in Zesty: Fix Released Bug description: == SRU template for golang-1.6 == [Impact] The kernel bug reported below means that occasionally (maybe 1 in 1000 times) the snapd -> snap-confine exec that is part of a snap execution fails to take the setuid bit on the snap-confine binary into account which means that the execution fails. This is extremely confusing for the user of the snap who just sees a permission denied error with no explanation. The kernel bug has been fixed in Xenial+ but not all users of snapd are on xenial+ kernels (they might be on trusty or another distribution entirely). Backporting this fix will mean that the snapd in the core snap will get the workaround next time it is built and because the snapd in trusty or the other distro will re-exec into the snapd in the core snap before execing snap-confine, users should not see the above behaviour. [Test case] This will be a bit tricky as the kernel bug has been fixed. A xenial container on a trusty host/VM should do the trick. The test case from https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813 should be sufficient to demonstrate the bug and then, once golang-1.6 has been upgraded from proposed, the fix. [Regression potential] If there is a bug in the patch it could cause deadlocks in currently working programs. But the patch is pretty simple and has passed review upstream so I think it should be OK. == SRU REQUEST XENIAL, YAKKETY, ZESTY == Due to two race conditions in check_unsafe_exec(), exec'ing a setuid binary from a threaded program sometimes fails to setuid. == Fix == Sauce patch for Xenial, Yakkety + Zesty: https://lists.ubuntu.com/archives/kernel-team/2017-May/084102.html This fix re-executes the unsafe check if there is a discrepancy between the expected fs count and the found count during the racy window during thread exec or exit. This re-check occurs very infrequently and saves a lot of addition locking on per thread structures that would make performance of fork/exec/exit prohibitively expensive. == Test case == See the example C code in the patch, https://lists.ubuntu.com/archives/kernel-team/2017-May/084102.html Run the test code as follows: for i in $(seq 1000); do ./a; done With the patch, no messages are emitted, without the patch, one sees a message: "Failed, got euid 1000 (expecting 0)" ..which shows the setuid program failed the check_unsafe_exec() because of the race. == Regression potential == breaking existing safe exec semantics. ==================== This can be reproduced with https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813 With that, and go 1.8, if you run “make” and then for i in `seq 99`; do ./a_go; done you'll see a variable number of ”GOT 1000” (or whatever your user id is). If you don't, add one or two more 9s on there. That's a simple go reproducer. You can also use “a_p” instead of “a_go” to see one that only uses pthreads. “a_c” is a C version that does *not* reproduce the issue. But it's not pthreads: if in a_go.go you comment out the “import "C"”, you'll still see the “GOT 1000” messages, in a static binary that uses no pthreads, just clone(2). You'll also see a bunch of warnings because it's not properly handling an EAGAIN from clone, but that's unrelated. If you pin the process to a single thread using taskset, you don't get the issue from a_go; a_p continues to reproduce the issue. In some virtualized environments we haven't been able to reproduce the issue either (e.g. some aws instances), but kvm works (you need -smp to see the issue from a_go). ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: linux-image-4.4.0-64-generic 4.4.0-64.85 ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44 Uname: Linux 4.4.0-64-generic x86_64 NonfreeKernelModules: zfs zunicode zcommon znvpair zavl ApportVersion: 2.20.1-0ubuntu2.5 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/pcmC0D0p: john 2354 F...m pulseaudio /dev/snd/controlC0: john 2354 F.... pulseaudio CurrentDesktop: Unity Date: Tue Mar 14 17:17:23 2017 HibernationDevice: RESUME=UUID=b9fd155b-dcbe-4337-ae77-6daa6569beaf InstallationDate: Installed on 2014-04-27 (1051 days ago) InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417) MachineType: Dell Inc. Latitude E6510 ProcFB: 0 inteldrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-64-generic root=/dev/mapper/ubuntu--vg-root ro enable_mtrr_cleanup mtrr_spare_reg_nr=8 mtrr_gran_size=32M mtrr_chunk_size=32M quiet splash RelatedPackageVersions: linux-restricted-modules-4.4.0-64-generic N/A linux-backports-modules-4.4.0-64-generic N/A linux-firmware 1.157.8 SourcePackage: linux SystemImageInfo: Error: command ['system-image-cli', '-i'] failed with exit code 2: UpgradeStatus: Upgraded to xenial on 2015-06-18 (634 days ago) dmi.bios.date: 12/05/2013 dmi.bios.vendor: Dell Inc. dmi.bios.version: A16 dmi.board.vendor: Dell Inc. dmi.chassis.type: 9 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvrA16:bd12/05/2013:svnDellInc.:pnLatitudeE6510:pvr0001:rvnDellInc.:rn:rvr:cvnDellInc.:ct9:cvr: dmi.product.name: Latitude E6510 dmi.product.version: 0001 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1672819/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp