Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
On Monday, September 28, 2015 11:14:42 PM Namsun Ch'o wrote: > > My understanding of the config file you proposed is that it would allow > > the > > configuration of a whitelist, so changes to the config very could result > > in > > *less* strict of a filter, not always more. > > No. Any time an administrator wants a syscall that is not enabled in the > sandbox, they either don't actually need it, or it's a bug and should be > fixed. So all the config would do is add argument filters to syscalls which > are already whitelisted. If the syscall, without arguments, is already added to the whitelist then adding a new libseccomp rule to allow that same syscall with specific arguments will have no effect since a broader rule already exists in the filter. > The alternative would be that the syscalls are given no further argument > filtering. The config could only make the filteres more restrictive, never > less. I still don't see how this is the case, but it probably isn't worth arguing any further without some patches. > Perhaps there could be a #define somewhere that toggles whether or not a > syscall argument filter can be created for a syscall which is not in the > built-in whitelist, otherwise it would throw an error saying that you cannot > create an argument filter for a syscall that is not permitted. I would argue you should never be able to add a syscall to the whitelist via a config file and/or command line option, but that is my opinion. -- paul moore security @ redhat
Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
On Monday, September 28, 2015 05:34:53 PM Namsun Ch'o wrote: > > To be clear, I'm not suggesting "--enable-syscalls=foo,bar,...", what I'm > > suggesting is a decomposition of the current filter list into blocks of > > syscalls that are needed to enable specific functionality. For example, > > if you enable audio support at runtime a set of syscalls will be added to > > the filter whitelist, if you enable a network device a different set of > > syscalls will be added to the filter, and so on. > > > > I think having an admin specified filter, either via a command line or > > configuration file, is a step in the wrong direction. > > How come? I think it is safer than forcing an admin to re-compile everything > (which just won't happen in an enterprise environment). For starters, I don't believe that a random administrator understands the QEMU code and the potential impact of any changes to such a config file as well as the QEMU developers. > ... If any configuration file only increases the strictness of a syscall, I > don't see the danger of an admin shooting themselves in the foot. Allowing > an admin to decrease security would be a problem, but they can do -sandbox > off anyway. My understanding of the config file you proposed is that it would allow the configuration of a whitelist, so changes to the config very could result in *less* strict of a filter, not always more. Also, while yes, allowing an admin to disable the sandbox via a command line switch does disable the seccomp filter, it is obvious that such a step is being taken. > But if the dynamic sandbox is strict enough for each feature, it'd be great. -- paul moore security @ redhat
Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
On Saturday, September 26, 2015 01:06:57 AM Namsun Ch'o wrote: > > I've suggested this in the past but to my knowledge no has done any work > > in this direction, including myself. Despite the lack of progress, I still > > think this is a very worthwhile idea. > > Which is exactly why I think a configuration file would be the best option > instead of --enable-syscalls=foo,bar,baz. It would allow someone to easily > customize their policy without needing to create a patch, or wait on QEMU > developers to do work on it. To be clear, I'm not suggesting "--enable-syscalls=foo,bar,...", what I'm suggesting is a decomposition of the current filter list into blocks of syscalls that are needed to enable specific functionality. For example, if you enable audio support at runtime a set of syscalls will be added to the filter whitelist, if you enable a network device a different set of syscalls will be added to the filter, and so on. I think having an admin specified filter, either via a command line or configuration file, is a step in the wrong direction. -- paul moore security @ redhat
Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
On Friday, September 25, 2015 12:53:04 AM Namsun Ch'o wrote: > Another idea which would fit in with the security model is to have a dynamic > sandbox which enables syscalls and syscall filters based on what command > line or config parameters are passed to QEMU on its first start. I've suggested this in the past but to my knowledge no has done any work in this direction, including myself. Despite the lack of progress, I still think this is a very worthwhile idea. -- paul moore security @ redhat
Re: [Qemu-devel] [PATCH] Add a few argument filters to the seccomp sandbox
On Thursday, September 10, 2015 03:48:52 PM Daniel P. Berrange wrote: > On Wed, Sep 09, 2015 at 09:55:33PM -0400, namn...@safe-mail.net wrote: > > + > > +/* shmget */ > > +rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmget), 2, > > +SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE), > > +SCMP_A2(SCMP_CMP_EQ, IP_CREAT|0777)); > > I'm not familiar with semantics of these seccomp rules, but is this > saying that the second arg must be exactly equal to IP_CREAT|0777 ? Yes. > If the app passes IP_CREAT|0600, would that be permitted instead ? > The latter is what I see gtk2 source code passing for mode. It wouldn't match the rule as written above, if it doesn't match any other configured rules it would hit the default filter action. -- paul moore security @ redhat
Re: [Qemu-devel] [PATCH for-2.3] Revert seccomp tests that allow it to be used on non-x86 architectures
On Wednesday, July 01, 2015 02:07:49 PM Andrew Jones wrote: On Tue, Jun 30, 2015 at 01:18:49PM -0400, Paul Moore wrote: On Tuesday, June 30, 2015 06:07:40 PM Peter Maydell wrote: On 30 June 2015 at 18:01, Paul Moore pmo...@redhat.com wrote: I'm starting to wonder if the 32-bit ARM build system didn't have __NR_cacheflush defined in the system headers; that might explain some of the behavior. Could you check your system to see if it has __NR_cacheflush defined (try /usr/include/asm/unistd.h)? The constant name is __ARM_NR_cacheflush, not __NR_cacheflush (all the ARM-specific syscalls are __ARM_NR_*). See http://lxr.free-electrons.com/source/arch/arm/include/uapi/asm/unistd.h# L418 /me smacks his forehead Of course it is. We already work around that in arch-syscall-validate. D'oh! Good news though, I think we just found the bug ;) I'm currently trying to put out another fire in a different project; as soon as I've got that done I'll fix this. However, if somebody wants to play, I'm always happy to accept patches :) Sent: https://groups.google.com/forum/#!topic/libseccomp/RD9RTmc2Lxo Applied, thanks. I'll send the patch for qemu to add cacheflush to the whitelist shortly. -- paul moore security @ redhat
Re: [Qemu-devel] [PATCH for-2.3] Revert seccomp tests that allow it to be used on non-x86 architectures
On Tuesday, June 30, 2015 10:39:34 AM Andrew Jones wrote: On Mon, Jun 29, 2015 at 04:24:55PM -0400, Paul Moore wrote: Hmm, so either the kernel is screwing up with the seccomp filter for this particular syscall (unlikely) or libseccomp is screwing up the filter creation (more likely). I don't have an ARM system handy at the moment, but could you use the seccomp_export_pfc() and seccomp_export_bpf() functions to dump the PFC/BPF filter code to a file and send it out? Attached Thanks. I looked at the pfc file and compared all the syscalls in it vs. the list in qemu-seccomp.c. The pfc file is missing cacheflush, and has an 'UNKNOWN' instead. Yeah, the associated syscall number is showing -10104 which is the pseudo syscall number for architectures that don't support cacheflush(). That is obviously wrong Also, I think there may be another problem with the filter (or pfc) generation. Several of the syscalls have weird syscall numbers. For example, I would expect mmap to be 90, but instead it's -10181. That is intentional. According to the Linux kernel sources mmap() isn't defined for 32-bit ARM EABI so you are seeing libseccomp's pseudo syscall number for mmap(). And, since there was something weird, and not related to cacheflush, in the arm32 pfc, I decided to check it on my mustang too. The output there gets cacheflush for the name instead of UNKNOWN, but has the same weird number (-10104) that the midway has. It also has several other weird numbers. The output from the mustang is in the attached tarball as well. I would expect aarch64 to have a pseudo syscall number for cacheflush() as that syscall isn't defined for 64-bit ARM. What I don't understand is why libseccomp doesn't recognize cacheflush() in this particular case. I'm starting to wonder if the 32-bit ARM build system didn't have __NR_cacheflush defined in the system headers; that might explain some of the behavior. Could you check your system to see if it has __NR_cacheflush defined (try /usr/include/asm/unistd.h)? If it does, could you try rebuilding the libseccomp package on your system and seeing if it resolves the problem? -- paul moore security @ redhat
Re: [Qemu-devel] [PATCH for-2.3] Revert seccomp tests that allow it to be used on non-x86 architectures
On Tuesday, June 30, 2015 06:07:40 PM Peter Maydell wrote: On 30 June 2015 at 18:01, Paul Moore pmo...@redhat.com wrote: I'm starting to wonder if the 32-bit ARM build system didn't have __NR_cacheflush defined in the system headers; that might explain some of the behavior. Could you check your system to see if it has __NR_cacheflush defined (try /usr/include/asm/unistd.h)? The constant name is __ARM_NR_cacheflush, not __NR_cacheflush (all the ARM-specific syscalls are __ARM_NR_*). See http://lxr.free-electrons.com/source/arch/arm/include/uapi/asm/unistd.h#L418 /me smacks his forehead Of course it is. We already work around that in arch-syscall-validate. D'oh! Good news though, I think we just found the bug ;) I'm currently trying to put out another fire in a different project; as soon as I've got that done I'll fix this. However, if somebody wants to play, I'm always happy to accept patches :) -- paul moore security @ redhat
Re: [Qemu-devel] [PATCH for-2.3] Revert seccomp tests that allow it to be used on non-x86 architectures
On Monday, June 29, 2015 09:50:17 AM Andrew Jones wrote: On Fri, Jun 26, 2015 at 04:26:22PM -0400, Paul Moore wrote: Perhaps a stupid question, but you did verify that it is cacheflush that is causing the problem? The seccomp filter code will emit a message to syslog or the audit log, depending on your configuration, with the syscall number. #./tools/scmp_sys_resolver -a arm cacheflush 983042 #./tools/scmp_sys_resolver -a arm 983042 I hadn't before (didn't know about the logging). I had determined the problem by running qemu in gdb. I just checked now though and confirmed it type=SECCOMP msg=audit(1435563996.731:2032): auid=1001 uid=1001 gid=1001 ses=157 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=27059 comm=qemu-system-arm exe=/home/drjones/code/qemu/arm-softmmu/qemu-system-arm sig=31 arch=4028 syscall=983042 compat=0 ip=0xb6b43164 code=0x0 This log was generated even with the above patch applied to qemu. The only thing that comes to mind quickly is that the cacheflush() call is being done by a thread that was created before the seccomp filter was loaded into the kernel; although I believe you said you already checked that. If you are using a recent kernel and libseccomp you can try enabling the SCMP_FLTATR_CTL_TSYNC attribute to apply the filter to all running threads in the process. rc = seccomp_attr_set(ctx, SCMP_FLTATR_CTL_TSYNC, 1); if (rc) /* error */ ... although that may have unintended consequences since threads which were never filtered are not getting caught up in the seccomp filter. Although, the current QEMU seccomp filter is so permissive it might not be a real concern. Anyway, (double) check the thread creation and seccomp_load() ordering. -- paul moore security @ redhat
Re: [Qemu-devel] [PATCH for-2.3] Revert seccomp tests that allow it to be used on non-x86 architectures
On Monday, June 29, 2015 07:47:29 PM Andrew Jones wrote: On Mon, Jun 29, 2015 at 10:53:14AM -0400, Paul Moore wrote: On Monday, June 29, 2015 09:50:17 AM Andrew Jones wrote: On Fri, Jun 26, 2015 at 04:26:22PM -0400, Paul Moore wrote: Perhaps a stupid question, but you did verify that it is cacheflush that is causing the problem? The seccomp filter code will emit a message to syslog or the audit log, depending on your configuration, with the syscall number. #./tools/scmp_sys_resolver -a arm cacheflush 983042 #./tools/scmp_sys_resolver -a arm 983042 I hadn't before (didn't know about the logging). I had determined the problem by running qemu in gdb. I just checked now though and confirmed it type=SECCOMP msg=audit(1435563996.731:2032): auid=1001 uid=1001 gid=1001 ses=157 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=27059 comm=qemu-system-arm exe=/home/drjones/code/qemu/arm-softmmu/qemu-system-arm sig=31 arch=4028 syscall=983042 compat=0 ip=0xb6b43164 code=0x0 This log was generated even with the above patch applied to qemu. The only thing that comes to mind quickly is that the cacheflush() call is being done by a thread that was created before the seccomp filter was loaded into the kernel; although I believe you said you already checked that. Nope, I hadn't, but I have now ... Actually, never mind on that, I was being stupid. If it was a different thread it wouldn't be impacted by the seccomp filter at all ... ... So we're calling __clear_cache from the same thread that called seccomp_start, and that thread dies the moment it calls the syscall. No other threads except id(2) at this time, which appears to be something created by __libc_start_main before main() runs. Hmm, so either the kernel is screwing up with the seccomp filter for this particular syscall (unlikely) or libseccomp is screwing up the filter creation (more likely). I don't have an ARM system handy at the moment, but could you use the seccomp_export_pfc() and seccomp_export_bpf() functions to dump the PFC/BPF filter code to a file and send it out? If you are using a recent kernel and libseccomp you can try enabling the SCMP_FLTATR_CTL_TSYNC attribute to apply the filter to all running threads in the process. rc = seccomp_attr_set(ctx, SCMP_FLTATR_CTL_TSYNC, 1); if (rc) /* error */ I tried this, but it error'ed out with rc == -95 (EOPNOTSUPP ?) My kernel version is 4.0.5-200.fc21.armv7hl+lpae That should be a recent enough kernel, but perhaps your version of libseccomp was built against an older version of the kernel that didn't have the necessary support (and it was disabled at compile time)? -- paul moore security @ redhat
Re: [Qemu-devel] [PATCH for-2.3] Revert seccomp tests that allow it to be used on non-x86 architectures
On Friday, June 26, 2015 06:03:18 PM Andrew Jones wrote: On Tue, Jun 16, 2015 at 02:16:03PM +0100, Peter Maydell wrote: On 16 June 2015 at 14:12, Andrew Jones drjo...@redhat.com wrote: Can we now revert this revert, along with bumping the non-x86 arch atleast-version to v2.2.1 Probably. I suggest you submit a patch and test it on the relevant architectures and seccomp versions. I don't see any problems with the light testing (booting a guest) I've done on my mustang, but AArch64 worked with libseccomp 2.2.0 too. So I dusted off my Midway (updated to Fedora 21 that has libseccomp 2.2.1 packaged), and gave it a try, but unfortunately it still doesn't work... I found that we needed to add another syscall to the whitelist; the arm-private 'cacheflush', as it's used by __builtin___clear_cache. And, from libseccomp's git history it appears that syscall is known commit a710a2d246bdc73ba77e3ff5624e790688cc51fd Author: Paul Moore pmo...@redhat.com Date: Wed May 6 12:05:45 2015 -0400 arm: add some missing syscalls Add the following syscalls to the ARM arch/ABI and update the syscall validation script. * breakpoint() * cacheflush() * usr26() * usr32() * set_tls() Reported-by: Purcareata Bogdan b43...@freescale.com Signed-off-by: Paul Moore pmo...@redhat.com And also appears to be in 2.2.1 $ git describe a710a2d246bdc73ba77e3ff5624e790688cc51fd v2.2.0-10-ga710a2d246bdc However the qemu thread that makes that syscall still dies, even with this patch diff --git a/qemu-seccomp.c b/qemu-seccomp.c index f9de0d3390feb..33644a4e3c3d3 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -237,7 +237,8 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(fadvise64), 240 }, { SCMP_SYS(inotify_init1), 240 }, { SCMP_SYS(inotify_add_watch), 240 }, -{ SCMP_SYS(mbind), 240 } +{ SCMP_SYS(mbind), 240 }, +{ SCMP_SYS(cacheflush), 240 }, }; int seccomp_start(void) Paul, can you help me figure out what I'm missing? Perhaps a stupid question, but you did verify that it is cacheflush that is causing the problem? The seccomp filter code will emit a message to syslog or the audit log, depending on your configuration, with the syscall number. #./tools/scmp_sys_resolver -a arm cacheflush 983042 #./tools/scmp_sys_resolver -a arm 983042 -- paul moore security @ redhat
Re: [Qemu-devel] seccomp breakage on arm
On Friday, April 10, 2015 05:40:40 PM Andreas Färber wrote: My main concern - that you apparently misunderstood - was whether this is a QEMU or a libseccomp issue. If it's on libseccomp's side then it's less urgent for QEMU and any new configure checks are just candy IMO. My opinion is that the 32-bit ARM build issue you described is a libseccomp bug (see the bug fix I sent a few hours ago); QEMU's usage of libseccomp is perfectly reasonable. That said, the libseccomp bug clearly affected QEMU in a bad way. Making matters worse is that the problem wasn't caught until late in the QEMU release process, nobody likes surprises like this. We've corrected the issue in libseccomp, and it looks like the QEMU folks are reverting ARM/seccomp support so I think we've resolved this for the immediate future. Long term I think the libseccomp project can look at adding some additional automated tests so these issues will be caught at build/packaging time, further, I think the QEMU project can restore ARM/seccomp support in the next release and look at its own testing procedures to understand why this issue wasn't caught earlier as well. -- paul moore security @ redhat
Re: [Qemu-devel] seccomp breakage on arm
On Thursday, April 09, 2015 10:21:52 AM Eduardo Otubo wrote: On Thu, Apr 09, 2015 at 05=01=31AM +0200, Andreas Färber wrote: Hello, I am seeing the following build failure on openSUSE Tumbleweed armv7l with --enable-seccomp in v2.3.0-rc2: [ 551s] In file included from qemu-seccomp.c:16:0: [ 551s] /usr/include/libseccomp/seccomp.h:177:23: error: '__NR_mmap' undeclared here (not in a function) [ 551s] #define SCMP_SYS(x) (__NR_##x) [ 551s]^ [ 551s] qemu-seccomp.c:36:7: note: in expansion of macro 'SCMP_SYS' [ 551s] { SCMP_SYS(mmap), 247 }, [ 551s]^ [ 551s] /usr/include/libseccomp/seccomp.h:177:23: error: '__NR_getrlimit' undeclared here (not in a function) [ 551s] #define SCMP_SYS(x) (__NR_##x) [ 551s]^ [ 551s] qemu-seccomp.c:57:7: note: in expansion of macro 'SCMP_SYS' [ 551s] { SCMP_SYS(getrlimit), 245 }, [ 551s]^ [ 551s] /home/abuild/rpmbuild/BUILD/qemu-2.3.0-rc2/rules.mak:57: recipe for target 'qemu-seccomp.o' failed [ 551s] make: *** [qemu-seccomp.o] Error 1 Is this a problem with libseccomp 2.2.0 / master and needs to be fixed in the library? Or do we need to #ifdef some syscalls in qemu-seccomp.c? This should be already fixed in the library as mentioned by the maintainer in this[0] thread. Adding Paul Moore in CC. There's also a bug entry on launchpad[1] for that. I provided the patch (before the pull reuqest) requesting for some review and testing but never heard back again. Also CC'ing Karl-Philipp Richter (bug owner) for some opinions on that as well. Regards, [0] http://sourceforge.net/p/libseccomp/mailman/message/32955831/ [1] https://bugs.launchpad.net/qemu/+bug/1363641 This should be fixed with libseccomp v2.2.0; if you are still seeing problems using v2.2.0 let me know. -- paul moore security @ redhat
Re: [Qemu-devel] seccomp breakage on arm
On Thursday, April 09, 2015 02:28:24 PM Andreas Färber wrote: Am 09.04.2015 um 11:10 schrieb Paul Moore: On Thursday, April 09, 2015 10:21:52 AM Eduardo Otubo wrote: On Thu, Apr 09, 2015 at 05=01=31AM +0200, Andreas Färber wrote: Hello, I am seeing the following build failure on openSUSE Tumbleweed armv7l with --enable-seccomp in v2.3.0-rc2: [ 551s] In file included from qemu-seccomp.c:16:0: [ 551s] /usr/include/libseccomp/seccomp.h:177:23: error: '__NR_mmap' undeclared here (not in a function) [ 551s] #define SCMP_SYS(x) (__NR_##x) [ 551s]^ [ 551s] qemu-seccomp.c:36:7: note: in expansion of macro 'SCMP_SYS' [ 551s] { SCMP_SYS(mmap), 247 }, [ 551s]^ [ 551s] /usr/include/libseccomp/seccomp.h:177:23: error: '__NR_getrlimit' undeclared here (not in a function) [ 551s] #define SCMP_SYS(x) (__NR_##x) [ 551s]^ [ 551s] qemu-seccomp.c:57:7: note: in expansion of macro 'SCMP_SYS' [ 551s] { SCMP_SYS(getrlimit), 245 }, [ 551s]^ [ 551s] /home/abuild/rpmbuild/BUILD/qemu-2.3.0-rc2/rules.mak:57: recipe for target 'qemu-seccomp.o' failed [ 551s] make: *** [qemu-seccomp.o] Error 1 Is this a problem with libseccomp 2.2.0 / master and needs to be fixed in the library? Or do we need to #ifdef some syscalls in qemu-seccomp.c? This should be already fixed in the library as mentioned by the maintainer in this[0] thread. Adding Paul Moore in CC. There's also a bug entry on launchpad[1] for that. I provided the patch (before the pull reuqest) requesting for some review and testing but never heard back again. Also CC'ing Karl-Philipp Richter (bug owner) for some opinions on that as well. Regards, [0] http://sourceforge.net/p/libseccomp/mailman/message/32955831/ [1] https://bugs.launchpad.net/qemu/+bug/1363641 This should be fixed with libseccomp v2.2.0; if you are still seeing problems using v2.2.0 let me know. This *is* with libseccomp v2.2.0, as mentioned above, and I had checked that there were no related changes beyond v2.2.0 on your master branch. I saw were you were *asking* if this was a problem with libseccomp v2.2.0, not stating that you were seeing a problem with v2.2.0; I interpreted your comments as running a version of libseccomp v2.2.0 and you were asking if the problem had been fixed before your upgraded your copy of libseccomp. Regardless, I think I see what the problem is, and if I'm correct it affects time, umount, stime, alarm, utime, getrlimit, select, readdir, mmap, socketcall, syscall, and ipc. I'm traveling at the moment so a patch may be a bit delayed, but I'll be sure to CC you on the fix in case you are able to do some testing. Thanks for the report, I'm sorry about the initial confusion. -Paul -- paul moore security @ redhat
[Qemu-devel] [PATCH] seccomp: add mbind() to the syscall whitelist
The memory-backend-ram QOM object utilizes the mbind(2) syscall to set the policy for a memory range. Add the syscall to the seccomp sandbox whitelist. Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index af6a375..b0c6269 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -235,7 +235,8 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(fallocate), 240 }, { SCMP_SYS(fadvise64), 240 }, { SCMP_SYS(inotify_init1), 240 }, -{ SCMP_SYS(inotify_add_watch), 240 } +{ SCMP_SYS(inotify_add_watch), 240 }, +{ SCMP_SYS(mbind), 240 } }; int seccomp_start(void)
Re: [Qemu-devel] [PATCHv3] seccomp: change configure to avoid arm 32 to break
On Friday, November 07, 2014 10:05:44 AM Eduardo Otubo wrote: Current stable version of libseccomp (2.1.1) only supports i386 and x86_64 archs correctly. This patch limits the usage of the syscall filter for those archs and updates to the correct last version of libseccomp. This patch also fixes the bug: https://bugs.launchpad.net/qemu/+bug/1363641 Signed-off-by: Eduardo Otubo eduardo.ot...@profitbricks.com --- configure | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Thanks Eduardo, I'll let you know once I've cut a new release of libseccomp. Acked-by: Paul Moore pmo...@redhat.com diff --git a/configure b/configure index 2f17bf3..47048f0 100755 --- a/configure +++ b/configure @@ -1823,7 +1823,8 @@ fi # libseccomp check if test $seccomp != no ; then -if $pkg_config --atleast-version=2.1.0 libseccomp; then +if test $cpu = i386 || test $cpu = x86_64 +$pkg_config --atleast-version=2.1.1 libseccomp; then libs_softmmu=$libs_softmmu `$pkg_config --libs libseccomp` QEMU_CFLAGS=$QEMU_CFLAGS `$pkg_config --cflags libseccomp` seccomp=yes -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: change configure to avoid arm 32 to break
On Thursday, November 06, 2014 03:49:18 PM Eduardo Otubo wrote: Right now seccomp is breaking the compilation of Qemu on armv7l due to libsecomp current lack of support for this arch. This problem is already fixed on libseccomp upstream but no release date for that is scheduled to far. This patch disables support for seccomp on armv7l temporarily until libseccomp does a new release. Then I'll remove the hack and update libseccomp dependency on configure script. Related bug: https://bugs.launchpad.net/qemu/+bug/1363641 Signed-off-by: Eduardo Otubo eduardo.ot...@profitbricks.com --- configure | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/configure b/configure index 2f17bf3..16fd7f5 100755 --- a/configure +++ b/configure @@ -1823,15 +1823,17 @@ fi # libseccomp check if test $seccomp != no ; then -if $pkg_config --atleast-version=2.1.0 libseccomp; then -libs_softmmu=$libs_softmmu `$pkg_config --libs libseccomp` -QEMU_CFLAGS=$QEMU_CFLAGS `$pkg_config --cflags libseccomp` - seccomp=yes -else - if test $seccomp = yes; then -feature_not_found libseccomp Install libseccomp devel = 2.1.0 - fi - seccomp=no +if test $cpu = i386 || test $cpu = x86_64; then +if $pkg_config --atleast-version=2.1.0 libseccomp; then +libs_softmmu=$libs_softmmu `$pkg_config --libs libseccomp` +QEMU_CFLAGS=$QEMU_CFLAGS `$pkg_config --cflags libseccomp` +seccomp=yes +else +if test $seccomp = yes; then +feature_not_found libseccomp Install libseccomp devel = 2.1.0 +fi +seccomp=no +fi fi fi ## Also, note the current release of libseccomp is v2.1.1 which has a number of bug fixes on top of v2.1.0. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: change configure to avoid arm 32 to break
On Thursday, November 06, 2014 10:24:41 AM Eduardo Otubo wrote: Paul, do you have any plans for a new libseccomp release? Yes, I have plans. Do I have a date, no. ;) I was trying to sync up with support for a new ABI, but I believe that was pushed to v3.19. I'll need to check on things, but it may be worth doing a new release in the meantime. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: change configure to avoid arm 32 to break
On Thursday, November 06, 2014 05:36:04 PM Eduardo Otubo wrote: On Thu, Nov 06, 2014 at 11:22:16AM -0500, Paul Moore wrote: On Thursday, November 06, 2014 03:49:18 PM Eduardo Otubo wrote: Right now seccomp is breaking the compilation of Qemu on armv7l due to libsecomp current lack of support for this arch. This problem is already fixed on libseccomp upstream but no release date for that is scheduled to far. This patch disables support for seccomp on armv7l temporarily until libseccomp does a new release. Then I'll remove the hack and update libseccomp dependency on configure script. Related bug: https://bugs.launchpad.net/qemu/+bug/1363641 Signed-off-by: Eduardo Otubo eduardo.ot...@profitbricks.com --- configure | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/configure b/configure index 2f17bf3..16fd7f5 100755 --- a/configure +++ b/configure @@ -1823,15 +1823,17 @@ fi # libseccomp check if test $seccomp != no ; then -if $pkg_config --atleast-version=2.1.0 libseccomp; then -libs_softmmu=$libs_softmmu `$pkg_config --libs libseccomp` -QEMU_CFLAGS=$QEMU_CFLAGS `$pkg_config --cflags libseccomp` - seccomp=yes -else - if test $seccomp = yes; then -feature_not_found libseccomp Install libseccomp devel = 2.1.0 - fi - seccomp=no +if test $cpu = i386 || test $cpu = x86_64; then +if $pkg_config --atleast-version=2.1.0 libseccomp; then +libs_softmmu=$libs_softmmu `$pkg_config --libs libseccomp` +QEMU_CFLAGS=$QEMU_CFLAGS `$pkg_config --cflags libseccomp` +seccomp=yes +else +if test $seccomp = yes; then +feature_not_found libseccomp Install libseccomp devel = 2.1.0 +fi +seccomp=no +fi fi fi ## Also, note the current release of libseccomp is v2.1.1 which has a number of bug fixes on top of v2.1.0. Does that applies to the distros package version? Well, I can't speak for all distros, but I always recommend the latest bug-fix version for obvious reasons. While I do control the libseccomp package for some distributions, I don't control them all. I've got enough to worry about, I'll let others worry about packaging :) I'm running Ubuntu 14.04 and it's still 2.1.0. A regular user would have to download and install from scratch in order to build Qemu, then. I would recommend filing a request for Debian/Ubuntu to package the latest libseccomp; v2.1.1 is over a year old at this point. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: change configure to avoid arm 32 to break
On Wednesday, November 05, 2014 05:08:20 PM Peter Maydell wrote: On 5 November 2014 16:47, Eduardo Otubo wrote: Right now seccomp is breaking the compilation of Qemu on armv7l due to libsecomp current lack of support for this arch. This problem is already fixed on libseccomp upstream but no release date for that is scheduled to far. This patch disables support for seccomp on armv7l temporarily until libseccomp does a new release. Then I'll remove the hack and update libseccomp dependency on configure script. Related bug: https://bugs.launchpad.net/qemu/+bug/1363641 ... (How are upstream proposing to fix this anyway? I couldn't figure that out from the mailing list thread.) The problem was that the released version of libseccomp has some holes in the internal syscall table for 32-bit ARM with respect to all of the other supported architectures. The current libseccomp upstream has some additional tooling and checks to ensure that the different ABI syscall tables are kept in sync to prevent something like this from happening in the future. I'm more than happy to discuss how libseccomp handles the different architectures, but that's probably a bit off-topic for this thread. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: change configure to avoid arm 32 to break
On Wednesday, November 05, 2014 08:08:06 PM Peter Maydell wrote: On 5 November 2014 19:46, Paul Moore pmo...@redhat.com wrote: On Wednesday, November 05, 2014 05:08:20 PM Peter Maydell wrote: On 5 November 2014 16:47, Eduardo Otubo wrote: Right now seccomp is breaking the compilation of Qemu on armv7l due to libsecomp current lack of support for this arch. This problem is already fixed on libseccomp upstream but no release date for that is scheduled to far. This patch disables support for seccomp on armv7l temporarily until libseccomp does a new release. Then I'll remove the hack and update libseccomp dependency on configure script. Related bug: https://bugs.launchpad.net/qemu/+bug/1363641 ... (How are upstream proposing to fix this anyway? I couldn't figure that out from the mailing list thread.) The problem was that the released version of libseccomp has some holes in the internal syscall table for 32-bit ARM with respect to all of the other supported architectures. The current libseccomp upstream has some additional tooling and checks to ensure that the different ABI syscall tables are kept in sync to prevent something like this from happening in the future. OK. So should we make QEMU say if x86_64 or i386, require seccomp 2.1 or better, else require 2.2 or better? I would probably just limit QEMU/seccomp to x86_64 and x86. Once we have the new release that fixes everything we can start worrying about versions and different ABIs. If our current source will build with seccomp 2.2 then that seems like a better check to put in our configure script than a simple disabling of the functionality on ARM hosts; it means that if distros end up with QEMU 2.2 plus seccomp 2.2 the functionality won't be unnecessarily disabled. (Please correct me if I have your next-release numbering wrong!) Well, technically we don't have libseccomp v2.2 yet so I can't say for certain what it will look like and how it will behave. I'm more than happy to discuss how libseccomp handles the different architectures, but that's probably a bit off-topic for this thread. I guess the only thing that matters for us is that there wasn't an API break required for the fix. Nope, the API is solid, just some internal fixes. -- paul moore security and virtualization @ redhat
[Qemu-devel] [PATCH] seccomp: add semctl() to the syscall whitelist
QEMU needs to call semctl() for correct operation. This particular problem was identified on shutdown with the following commandline: # qemu -sandbox on -monitor stdio \ -device intel-hda -device hda-duplex -vnc :0 Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index ea8094d..0503764 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -230,7 +230,8 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(timerfd_create), 240 }, { SCMP_SYS(shmctl), 240 }, { SCMP_SYS(mlock), 240 }, -{ SCMP_SYS(munlock), 240 } +{ SCMP_SYS(munlock), 240 }, +{ SCMP_SYS(semctl), 240 } }; int seccomp_start(void)
Re: [Qemu-devel] [PULL 00/02] seccomp: adding new syscalls to the whitelist
On Sunday, April 27, 2014 11:10:50 AM Paolo Bonzini wrote: Il 14/04/2014 16:47, Paul Moore ha scritto: Yes. Also the commits don't have your signed-off-by: so I can't apply it. Eduardo? It is absurd that we have had two fixes held up this long for such silly things. It's not silly, see http://lwn.net/Articles/592503/ for more information. The contributor agreement isn't silly. What is silly, and frustrating, are these stupid delays between comments/respin/etc. If you're trying to make some point about the contributor agreements, sign-offs, etc. please make it elsewhere. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PULL 00/02] seccomp: adding new syscalls to the whitelist
On Thursday, April 03, 2014 12:14:21 PM Peter Maydell wrote: On 2 April 2014 16:53, Paolo Bonzini pbonz...@redhat.com wrote: Il 01/04/2014 15:06, Eduardo Otubo ha scritto: Not sure why it didn't get upstream yet. Anthony, Peter, could you take a closer look at this? Peter filters on for you to fetch changes up to and your git didn't include it. :) Yes. Also the commits don't have your signed-off-by: so I can't apply it. Eduardo? It is absurd that we have had two fixes held up this long for such silly things. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PULL 00/02] seccomp: adding new syscalls to the whitelist
On Thursday, March 13, 2014 10:42:42 AM Eduardo Otubo wrote: The following changes since commit 750036a848ea913ba6343718ffa70da98f7eef6b: Merge remote-tracking branch 'remotes/afaerber/tags/prep-for-upstream' into staging (2014-03-12 17:53:37 +) are available in the git repository at: git://github.com/otubo/qemu.git seccomp Felix Geyer (1): seccomp: add timerfd_create and timerfd_settime to the whitelist Paul Moore (1): seccomp: add shmctl(), mlock(), and munlock() to the syscall whitelist qemu-seccomp.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) Notice this still hasn't made it upstream and thought I would check to see where things stood ... -- paul moore security and virtualization @ redhat
[Qemu-devel] [PATCH] seccomp: add getrusage() to the syscall whitelist for Open vSwitch
When QEMU is used with Open vSwitch it is common to create netdev script and downscript scripts that use the ovs-vsctl command to manage the underlying network devices. Unfortunately, ovs-vsctl calls the getrusage() syscall which is not currently present in the QEMU/seccomp whistelist. Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index caa926e..86210a4 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -225,7 +225,8 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(fchmod), 240 }, { SCMP_SYS(shmget), 240 }, { SCMP_SYS(shmat), 240 }, -{ SCMP_SYS(shmdt), 240 } +{ SCMP_SYS(shmdt), 240 }, +{ SCMP_SYS(getrusage), 240 } }; int seccomp_start(void)
Re: [Qemu-devel] [PATCH] seccomp: add shmctl(), mlock(), and munlock() to the syscall whitelist
On Wednesday, March 05, 2014 11:53:58 AM Eduardo Otubo wrote: On 03/03/2014 05:41 PM, Paul Moore wrote: On Wednesday, February 26, 2014 10:25:01 AM Paul Moore wrote: Additional testing reveals that PulseAudio requires shmctl() and the mlock()/munlock() syscalls on some systems/configurations. As before, on systems that do require these syscalls, the problem can be seen with the following command line: # qemu -monitor stdio -sandbox on \ -device intel-hda -device hda-duplex Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index caa926e..3db1e9b 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -225,7 +225,10 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(fchmod), 240 }, { SCMP_SYS(shmget), 240 }, { SCMP_SYS(shmat), 240 }, -{ SCMP_SYS(shmdt), 240 } +{ SCMP_SYS(shmdt), 240 }, +{ SCMP_SYS(shmctl), 240 }, +{ SCMP_SYS(mlock), 240 }, +{ SCMP_SYS(munlock), 240 } }; int seccomp_start(void) Bump to bring this back the forefront of everyone's minds. Can we get this merged? Sorry for the late review, I've been hit by a terrible tendinitis and was unable to use a computer for the last week. :( Hopefully nothing that isn't too unbearable and unmanageable. Tendinitis can be a real pain. I ACK this patch and I'll create a pull request by friday EOD if nothing else comes up. Great, thank you. I've run across one more issue related to Open vSwitch and getrusage(); I'll try to get that fixed/tested before Friday, but don't let that hold up this fix. On a somewhat related note, have you done any work on the dynamic, per-feature whitelist? I'm thinking of doing some work to put together a basic framework for this, but I didn't want to waste time on a duplicated effort ... -Paul -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: add shmctl(), mlock(), and munlock() to the syscall whitelist
On Wednesday, February 26, 2014 10:25:01 AM Paul Moore wrote: Additional testing reveals that PulseAudio requires shmctl() and the mlock()/munlock() syscalls on some systems/configurations. As before, on systems that do require these syscalls, the problem can be seen with the following command line: # qemu -monitor stdio -sandbox on \ -device intel-hda -device hda-duplex Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index caa926e..3db1e9b 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -225,7 +225,10 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(fchmod), 240 }, { SCMP_SYS(shmget), 240 }, { SCMP_SYS(shmat), 240 }, -{ SCMP_SYS(shmdt), 240 } +{ SCMP_SYS(shmdt), 240 }, +{ SCMP_SYS(shmctl), 240 }, +{ SCMP_SYS(mlock), 240 }, +{ SCMP_SYS(munlock), 240 } }; int seccomp_start(void) Bump to bring this back the forefront of everyone's minds. Can we get this merged? -- paul moore security and virtualization @ redhat
[Qemu-devel] [PATCH] seccomp: add shmctl(), mlock(), and munlock() to the syscall whitelist
Additional testing reveals that PulseAudio requires shmctl() and the mlock()/munlock() syscalls on some systems/configurations. As before, on systems that do require these syscalls, the problem can be seen with the following command line: # qemu -monitor stdio -sandbox on \ -device intel-hda -device hda-duplex Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index caa926e..3db1e9b 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -225,7 +225,10 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(fchmod), 240 }, { SCMP_SYS(shmget), 240 }, { SCMP_SYS(shmat), 240 }, -{ SCMP_SYS(shmdt), 240 } +{ SCMP_SYS(shmdt), 240 }, +{ SCMP_SYS(shmctl), 240 }, +{ SCMP_SYS(mlock), 240 }, +{ SCMP_SYS(munlock), 240 } }; int seccomp_start(void)
Re: [Qemu-devel] [PATCH 0/2] QEMU/seccomp fixes for PulseAudio
On Thursday, January 16, 2014 01:52:30 PM Eduardo Otubo wrote: On 01/15/2014 05:38 PM, Paul Moore wrote: It turns out we need to add some additional syscalls to QEMU to make PulseAudio happy. Two minor patches follow ... --- Paul Moore (2): seccomp: add mkdir() and fchmod() to the whitelist seccomp: add some basic shared memory syscalls to the whitelist qemu-seccomp.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) Paul, sorry for the so late review. I was on vacation until yesterday and I'm still trying to empty my inbox. No worries. Personally, I wouldn't call a one day turnaround as a late review, I'd call that pretty good :) -- paul moore security and virtualization @ redhat
[Qemu-devel] [PATCH 2/2] seccomp: add some basic shared memory syscalls to the whitelist
PulseAudio requires the use of shared memory so add shmget(), shmat(), and shmdt() to the syscall whitelist. Reported-by: xu...@redhat.com Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 89f244f..caa926e 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -222,7 +222,10 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(io_destroy), 241 }, { SCMP_SYS(arch_prctl), 240 }, { SCMP_SYS(mkdir), 240 }, -{ SCMP_SYS(fchmod), 240 } +{ SCMP_SYS(fchmod), 240 }, +{ SCMP_SYS(shmget), 240 }, +{ SCMP_SYS(shmat), 240 }, +{ SCMP_SYS(shmdt), 240 } }; int seccomp_start(void)
[Qemu-devel] [PATCH 0/2] QEMU/seccomp fixes for PulseAudio
It turns out we need to add some additional syscalls to QEMU to make PulseAudio happy. Two minor patches follow ... --- Paul Moore (2): seccomp: add mkdir() and fchmod() to the whitelist seccomp: add some basic shared memory syscalls to the whitelist qemu-seccomp.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-)
[Qemu-devel] [PATCH 1/2] seccomp: add mkdir() and fchmod() to the whitelist
The PulseAudio library attempts to do a mkdir(2) and fchmod(2) on /run/user/UID/pulse which is currently blocked by the syscall filter; this patch adds the two missing syscalls to the whitelist. You can reproduce this problem with the following command: # qemu -monitor stdio -device intel-hda -device hda-duplex If watched under strace the following syscalls are shown: mkdir(/run/user/0/pulse, 0700) fchmod(11, 0700) [NOTE: 11 is the fd for /run/user/0/pulse] Reported-by: xu...@redhat.com Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index b7c1253..89f244f 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -220,7 +220,9 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(io_cancel), 241 }, { SCMP_SYS(io_setup), 241 }, { SCMP_SYS(io_destroy), 241 }, -{ SCMP_SYS(arch_prctl), 240 } +{ SCMP_SYS(arch_prctl), 240 }, +{ SCMP_SYS(mkdir), 240 }, +{ SCMP_SYS(fchmod), 240 } }; int seccomp_start(void)
[Qemu-devel] [PATCH] seccomp: add mkdir() and fchmod() to the whitelist
The PulseAudio library attempts to do a mkdir(2) and fchmod(2) on /run/user/UID/pulse which is currently blocked by the syscall filter; this patch adds the two missing syscalls to the whitelist. You can reproduce this problem with the following command: # qemu -monitor stdio -device intel-hda -device hda-duplex If watched under strace the following syscalls are shown: mkdir(/run/user/0/pulse, 0700) fchmod(11, 0700) [NOTE: 11 is the fd for /run/user/0/pulse] Reported-by: xu...@redhat.com Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index cf07869..bb19306 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -220,7 +220,9 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(io_cancel), 241 }, { SCMP_SYS(io_setup), 241 }, { SCMP_SYS(io_destroy), 241 }, -{ SCMP_SYS(arch_prctl), 240 } +{ SCMP_SYS(arch_prctl), 240 }, +{ SCMP_SYS(mkdir), 240 }, +{ SCMP_SYS(fchmod), 240 } }; int seccomp_start(void)
Re: [Qemu-devel] [PATCH] seccomp: add mkdir() and fchmod() to the whitelist
On Friday, January 03, 2014 09:24:57 PM Paolo Bonzini wrote: Il 03/01/2014 20:58, Paul Moore ha scritto: The PulseAudio library attempts to do a mkdir(2) and fchmod(2) on /run/user/UID/pulse which is currently blocked by the syscall filter; this patch adds the two missing syscalls to the whitelist. You can reproduce this problem with the following command: # qemu -monitor stdio -device intel-hda -device hda-duplex If watched under strace the following syscalls are shown: mkdir(/run/user/0/pulse, 0700) fchmod(11, 0700) [NOTE: 11 is the fd for /run/user/0/pulse] Can fchmod be exploited to violate the sandbox (e.g. to let data escape from a VM that ought not to have any way to communicate with the outside world)? Technically, there is the potential for any syscall to be exploited in such a way that a malicious guest could gain greater access than desired and do something evil with that access. After all, that was the motivation behind seccomp: disable unused syscalls to reduce the chance of an attacker exploiting a syscall bug. The important thing to remember here is that the seccomp code in QEMU is not enabling syscalls, it is disabling them. In other words, a QEMU instance with the seccomp functionality enabled, e.g. '-sandbox on', only reduces the number of syscalls available to the QEMU process, it never increases or adds vulnerable syscalls to the QEMU process. Granted, yes, there are syscalls in the current whitelist that I wish we could disable, but we are still trying to arrive a whitelist that is all encompassing (or close to it) with respect to QEMU functionality. Once we have that list in hand (each fix like the one I posted gets us closer) we can start looking at selectively shrinking the whitelist*. * We've talked about this on-list previously and there are several approaches here, some include conditionally adding/removing syscalls based on the QEMU functionality requested, e.g. command line, different sandbox profiles, e.g. standalone vs libvirt, and staged seccomp filters, e.g. a whitelist followed by progressively tighter blacklists. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: exit if seccomp_init() fails
On Wednesday, December 18, 2013 11:48:11 AM Corey Bryant wrote: This fixes a bug where we weren't exiting if seccomp_init() failed. Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com --- qemu-seccomp.c | 1 + 1 file changed, 1 insertion(+) Acked-by: Paul Moore pmo...@redhat.com diff --git a/qemu-seccomp.c b/qemu-seccomp.c index cf07869..b7c1253 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -231,6 +231,7 @@ int seccomp_start(void) ctx = seccomp_init(SCMP_ACT_KILL); if (ctx == NULL) { +rc = -1; goto seccomp_return; } -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: -sandbox on won't kill Qemu when option not built in
On Tuesday, December 10, 2013 04:48:54 PM Lucas Meneghel Rodrigues wrote: On 12/10/2013 01:20 AM, Corey Bryant wrote: IMHO the test suite should probe to see if sandbox is working or not, and just not use the -sandbox on arg if the host doesn't support it. But I think this could be done on virt-test as well :) This would make sense. Although it sounds like Lucas was looking for an error message when seccomp kills qemu. Maybe virt-test could grep the audit log for the existence of a type=SECCOMP record within the test's time of execution, and issue a message based on that. It's a valid idea. The problem I see with it is that not every distro out there uses SELinux. Not getting into the merits of whether they should, ideally it'd be nice to have this working on distros that won't use SELinux. Minor point of clarification, but audit and SELinux and independent subsystems in the kernel. Also, and I don't have a non-audit kernel built at the moment to verify this, but on non-audit kernels the audit messages should be sent to syslog so you *should* still be able to search for SECCOMP records regardless. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: -sandbox on won't kill Qemu when option not built in
On Monday, December 09, 2013 03:51:36 PM Eduardo Otubo wrote: On 12/09/2013 03:33 PM, Daniel P. Berrange wrote: On Mon, Dec 09, 2013 at 03:20:52PM -0200, Eduardo Otubo wrote: This option was requested by virt-test team so they can run tests with Qemu and -sandbox on set without breaking whole test if host doesn't have support for seccomp in kernel. It covers two possibilities: 1) Host kernel support does not support seccomp, but user installed Qemu package with sandbox support: Libseccomp will fail - qemu will fail nicely and won't stop execution. 2) Host kernel has support but Qemu package wasn't built with sandbox feature. Qemu will fail nicely and won't stop execution. Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- vl.c | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) {snip} This change is really dubious from a security POV. If the admin requested sandboxing and the host or QEMU build cannot support it, then QEMU really *must* exit. I think an admin must know what he's doing. If he requested sandbox but without kernel support he need to step back a little and understand what he's doing. This patch won't decrease the security level, IMHO. NACK For the reasons Daniel already mentioned. Mistakes happen, a lot, and if the user explicitly requests security functionality and we can't provide it we need to fail in a manner that doesn't increase the user's risk. IMHO the test suite should probe to see if sandbox is working or not, and just not use the -sandbox on arg if the host doesn't support it. But I think this could be done on virt-test as well :) This would be ideal, but if you must have a fallback mechanism in QEMU proper, make it separate from '-sandbox on' so that it doesn't break with the current behavior and also makes it is obvious that the functionality is not guaranteed, e.g. '-sandbox try' or similar. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: add kill() to the syscall whitelist
On Thursday, November 21, 2013 02:40:48 PM Eduardo Otubo wrote: On 11/21/2013 01:40 PM, Paul Moore wrote: The kill() syscall is triggered with the following command: # qemu -sandbox on -monitor stdio \ -device intel-hda -device hda-duplex -vnc :0 The resulting syslog/audit message: # ausearch -m SECCOMP time-Wed Nov 20 09:52:08 2013 type=SECCOMP msg=audit(1384912328.482:6656): auid=0 uid=0 gid=0 ses=854 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=12087 comm=qemu-kvm sig=31 syscall=62 compat=0 ip=0x7f7a1d2abc67 code=0x0 # scmp_sys_resolver 62 kill Reported-by: CongLi c...@redhat.com Tested-by: CongLi c...@redhat.com Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |1 + 1 file changed, 1 insertion(+) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 69cee44..cf07869 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -114,6 +114,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(write), 244 }, { SCMP_SYS(fcntl), 243 }, { SCMP_SYS(tgkill), 242 }, +{ SCMP_SYS(kill), 242 }, { SCMP_SYS(rt_sigaction), 242 }, { SCMP_SYS(pipe2), 242 }, { SCMP_SYS(munmap), 242 }, ACK, Reviewed and tested. (I'll send a pull request tomorrow EOD) Reviewed-by: Eduardo Otubo ot...@linux.vnet.ibm.com Ping? -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH for-1.7] seccomp: setting -sandbox on by default
On Friday, November 22, 2013 11:34:41 AM Stefan Hajnoczi wrote: IMO this seccomp approach is doomed since QEMU does not practice privilege separation. QEMU is monolithic so it's really hard to create a meaningful sets of system calls. I'm a big fan of decomposing QEMU, but based on previous discussions there seems to be a lot of fear from the core QEMU folks around decomposition; enough that I'm not sure it is worth the time and effort at this point to pursue it. While I agree that a decomposed QEMU would be able to make better use of syscall filtering (and LSM/SELinux protection, and ...) I don't believe it means syscall filtering is a complete lost cause with a monolithic QEMU. Any improvement you can make, no matter how small, is still and improvement. To avoid breaking stuff you need to be too liberal, defeating the purpose of seccomp. Even if you can only disable a few syscalls you are still better off than you were before. Could it be done better, of course it could, but it doesn't mean you shouldn't try for some benefit. For each QEMU command-line there may be a different set of syscalls that should be allowed/forbidden. I'm not sure if you missed it or not, but I had an email exchange with Eduardo on this list about making the syscall whitelist a bit more intelligent and dependent on what functionality was enabled for a given QEMU instance. This should help a bit with the problems you are describing. The existing approach clearly doesn't support the full range of options that users specify on the command-line. Bugs. It will get fixed in time with more testing/debugging. Eduardo is working on improving the testing and RH's QA folks are working hard to shake out the bugs too. I just posted another bug fix patch to the whitelist a few days ago. So I guess the options are: 1. Don't make it the default since it breaks stuff but use it for very specific scenarios (e.g. libvirt use cases that have been well tested). In my opinion, I think it was probably a bit premature to make enable it by default, but at some point in the future I think we do need to do this. 2. Provide a kind of syscall set for various QEMU options and apply the union of them at launch. This still seems fragile but in theory it could work. This is what I was discussing above. I think this is likely the next big improvement. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH for-1.7] seccomp: setting -sandbox on by default
On Friday, November 22, 2013 11:39:31 AM Stefan Hajnoczi wrote: On Thu, Nov 21, 2013 at 10:48:58AM -0500, Paul Moore wrote: I'm always open to suggestions on how to improve the development/debugging process, so if you have any ideas please let me know. The failure mode is terrible: Glad to see you don't feel strongly about things. The following fails to start here (the shell hangs and ps shows QEMU is a defunct process) When a process dies the shell prints a warning. That alerts the user and prompts them to take further steps. Hanging the shell is a really bad way to fail. We can't expect users to begin searching logs for audit failures. They probably don't even know about audit or seccomp. First things first, if a normal user hits a seccomp failure I consider it to be a bug, and to be honest, I've seen much nastier, much more subtle bugs than an inadvertent seccomp death. In a perfect world only developers would run into this problem and I would expect developers to be smart enough to figure out what is going on. Getting past that, if seccomp is configured to kill the process when it violates the filter rules there isn't much else we can do; the kernel kills the process and then the rest of userspace, e.g. the shell/libvirt/etc., does whatever it does. We have very little control over things. Is it possible to produce output when a seccomp violation takes place? See the earlier comments from Eduardo about his attempts in this area. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH for-1.7] seccomp: setting -sandbox on by default
On Friday, November 22, 2013 04:48:41 PM Stefan Hajnoczi wrote: On Fri, Nov 22, 2013 at 09:44:42AM -0500, Paul Moore wrote: On Friday, November 22, 2013 11:39:31 AM Stefan Hajnoczi wrote: On Thu, Nov 21, 2013 at 10:48:58AM -0500, Paul Moore wrote: I'm always open to suggestions on how to improve the development/debugging process, so if you have any ideas please let me know. The failure mode is terrible: Glad to see you don't feel strongly about things. Sorry for the rant :). I know you and Eduardo understand the issues and have already been working on them. I can't speak for Eduardo, but no worries on my end; it just wouldn't be an Open Source project without a bit of hyperbole now and then would it? ;) I hope hearing it from a developer who isn't following seccomp is useful though. Definitely. I should have said it earlier, but I do appreciate you taking the time to comment. It shows which issues stick out and hinder usability. Users will only be happy with seccomp when it works silently behind the scenes. Exactly. Users don't tolerate bugs and I don't blame them. After all, at some point we are all users too. Developers will only be happy with seccomp if it's easy and rewarding to support/debug. Agreed. As a developer, how do you feel about the audit/syslog based approach I mentioned earlier? -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH for-1.7] seccomp: setting -sandbox on by default
On Thursday, November 21, 2013 04:14:11 PM Paolo Bonzini wrote: Il 30/10/2013 11:04, Stefan Hajnoczi ha scritto: On Wed, Oct 23, 2013 at 12:42:34PM -0200, Eduardo Otubo wrote: On 10/22/2013 11:00 AM, Anthony Liguori wrote: On Tue, Oct 22, 2013 at 12:21 PM, Eduardo Otubo ot...@linux.vnet.ibm.com wrote: Inverting the way sandbox handles arguments, making possible to have no argument and still have '-sandbox on' enabled. Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- The option '-sandbox on' is now used by default by virt-test[0] -- it has been merged into the 'next' branch and will be available in the next release, meaning we have a back support for regression tests if anything breaks because of some missing system call not listed in the whitelist. This being said, I think it makes sense to have this option set to 'on' by default in the next Qemu version. It's been a while since no missing syscall is reported and at this point the whitelist seems to be pretty mature. [0] - https://github.com/autotest/virt-test/commit/50e1f7d47a94f4c770880cd8e c0f18365dcba714 This breaks hot_add of a network device that uses a script= argument, correct? If so, this cannot be made default. Anthony, I believe you're talking about the blacklist feature. This is the old whitelist that is already upstream and it does not block any network device to be hot plugged. The following fails to start here (the shell hangs and ps shows QEMU is a defunct process): qemu-system-x86_64 -sandbox on -enable-kvm -m 1024 -cpu host \ -drive if=virtio,cache=none,file=test.img Easier-to-debug failures are another prerequisite for enabling the sandbox by default, I think. I believe I've posted this information before, but just in case ... IMHO, it is really not that hard to debug a seccomp failure; the first step is to look for the failure in the audit log or syslog. If you are on a Fedora/RHEL based system you are most likely running audit, so finding the seccomp failures are quite simple with the 'ausearch' command: # ausearch -m SECCOMP time-Wed Nov 20 09:52:08 2013 type=SECCOMP msg=audit(1384912328.482:6656): auid=0 uid=0 gid=0 ses=854 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=12087 comm=qemu-kvm sig=31 syscall=62 compat=0 ip=0x7f7a1d2abc67 code=0x0 ... if you are using syslog, feel free to use whatever tool you prefer, e.g. grep, less, etc. Once you have the syscall number, syscall=62, in the audit message above, you can use the 'scmp_sys_resolver' to resolve the number into a name: # scmp_sys_resolver 62 kill The 'scmp_sys_resolver' tool is part of the libseccomp-devel package on Fedora/RHEL based systems, it may be packaged differently on other distributions. I'm always open to suggestions on how to improve the development/debugging process, so if you have any ideas please let me know. -Paul -- paul moore security and virtualization @ redhat
[Qemu-devel] [PATCH] seccomp: add kill() to the syscall whitelist
The kill() syscall is triggered with the following command: # qemu -sandbox on -monitor stdio \ -device intel-hda -device hda-duplex -vnc :0 The resulting syslog/audit message: # ausearch -m SECCOMP time-Wed Nov 20 09:52:08 2013 type=SECCOMP msg=audit(1384912328.482:6656): auid=0 uid=0 gid=0 ses=854 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=12087 comm=qemu-kvm sig=31 syscall=62 compat=0 ip=0x7f7a1d2abc67 code=0x0 # scmp_sys_resolver 62 kill Reported-by: CongLi c...@redhat.com Tested-by: CongLi c...@redhat.com Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |1 + 1 file changed, 1 insertion(+) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 69cee44..cf07869 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -114,6 +114,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(write), 244 }, { SCMP_SYS(fcntl), 243 }, { SCMP_SYS(tgkill), 242 }, +{ SCMP_SYS(kill), 242 }, { SCMP_SYS(rt_sigaction), 242 }, { SCMP_SYS(pipe2), 242 }, { SCMP_SYS(munmap), 242 },
Re: [Qemu-devel] [PATCHv3 1/3] seccomp: adding blacklist support
On Tuesday, October 08, 2013 09:42:24 PM Eduardo Otubo wrote: v3: The -netdev tap option is checked in the vl.c file during the process of the command line argument list. It sets tap_enabled to true or false according to the configuration found. Later at the seccomp filter installation, this value is checked wheter to install or not this feature. I like the idea of slowly making the QEMU syscall filter dependent on the runtime configuration. With that in mind, I wonder if we should have a more general purpose API in include/sysemu/seccomp.h that allows QEMU to indicate to the the QEMU/seccomp code that a particular feature is enabled. Maybe something like this: #define SCMP_FEAT_TAP ... int seccomp_feature_enable(int feature); One more comment below. Adding a system call blacklist right before the vcpus starts. This filter is composed by the system calls that can't be executed after the guests are up. This list should be refined as whitelist is, with as much testing as we can do using virt-test. Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- include/sysemu/seccomp.h | 6 - qemu-seccomp.c | 64 +++- vl.c | 21 +++- 3 files changed, 77 insertions(+), 14 deletions(-) diff --git a/include/sysemu/seccomp.h b/include/sysemu/seccomp.h index 1189fa2..9dc7e52 100644 --- a/include/sysemu/seccomp.h +++ b/include/sysemu/seccomp.h @@ -15,8 +15,12 @@ #ifndef QEMU_SECCOMP_H #define QEMU_SECCOMP_H +#define WHITELIST 0 +#define BLACKLIST 1 Should these #defines be namespaced in some way, e.g. SCMP_LIST_BLACKLIST? #include seccomp.h #include qemu/osdep.h -int seccomp_start(void); +int seccomp_start(int list_type); + #endif -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv3 3/3] seccomp: general fixes
On Tuesday, October 08, 2013 09:42:26 PM Eduardo Otubo wrote: 1) On qemu-seccomp.c:255, the variable ctx was being used uninitialized; now it's initialized with NULL and it's being checked at the end of the function. 2) Changed the name of the command line option from enable to sandbox for a better understanding from user side. Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- qemu-seccomp.c | 4 ++-- vl.c | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 84a42bc..fdd0de3 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -258,7 +258,7 @@ seccomp_return: int seccomp_start(int list_type) { int rc = 0; -scmp_filter_ctx ctx; +scmp_filter_ctx ctx = NULL; switch (list_type) { case WHITELIST: @@ -285,7 +285,7 @@ int seccomp_start(int list_type) rc = seccomp_load(ctx); - seccomp_return: +seccomp_return: if (ctx) seccomp_release(ctx); return rc; Any particular reason these changes weren't folded into patch 1/3? -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PULL 00/01] seccomp: trivial changes
On Tuesday, September 24, 2013 03:30:57 PM Eduardo Otubo wrote: Anthony, The following changes since commit f828a4c8faa118e0ebab3e353ac6840f3b2a0318: Merge remote-tracking branch 'stefanha/tracing' into staging (2013-09-23 11:53:22 -0500) are available in the git repository at: git://github.com/otubo/qemu.git seccomp Eduardo Otubo (1): seccomp: fine tuning whitelist by adding times() qemu-seccomp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) Just a follow-up ping to see where things stand on this bugfix ... -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist
On Wednesday, September 04, 2013 10:11:10 AM Paul Moore wrote: On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote: This was causing Qemu process to hang when using -sandbox on. Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175 Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com Works for me. Tested-by: Paul Moore pmo...@redhat.com Eduardo, perhaps you should just merge this into your tree and send a pull request? This fix should also go into -stable. Acked-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 37d38f8..69cee44 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(getuid), 245 }, { SCMP_SYS(geteuid), 245 }, { SCMP_SYS(timer_create), 245 }, +{ SCMP_SYS(times), 245 }, { SCMP_SYS(exit), 245 }, { SCMP_SYS(clock_gettime), 245 }, { SCMP_SYS(time), 245 }, -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Wednesday, September 18, 2013 04:59:10 PM Daniel P. Berrange wrote: On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote: On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote: Libvirt does not want to be in the business of creating seccomp syscall filters for QEMU. As mentioned before, IMHO that places an unacceptable burden on libvirt to know about the syscalls each a particular version of QEMU requires for its operation. At a high level, I don't see how libvirt configuring and installing a syscall filter is substantially different from libvirt configuring and installing a network filter. The rules created for a network filter have no bearing or relation to internal QEMU implementation details, as you have with syscalls, so this isn't really a relevant comparison. The rules created for a network filter are directly related to the details of the guest running inside of QEMU. From a practical point of view I see both network and syscall filtering as being dependent on the guest; the network filtering configuration can change as the guest's services change, the syscall filtering configuration can change as the QEMU functionality can change. Also, and I recognize this is diverting away from a topic most of qemu-devel is not interested in, what about libvirt-lxc? What about all of the other virtualization drivers supported by libvirt (granted, not all would be candidates for syscall filtering, but you get the idea). It isn't clear to me that syscall filtering is something that's relevant for inclusion in libvirt-lxc. It seems like something that would be used by apps running inside LXC containers directly. For all the same reasons that it makes sense to filter syscalls in QEMU, I think it makes sense to filter syscalls in libvirt-lxc. The fundamental concern is that the kernel presents are large attack surface in the way of syscalls, and it is extremely likely that any given container does not have a legitimate need to call into all of the syscalls the kernel presents to userspace; especially if you consider the recent approaches of using containers to ship/deploy single applications. Also, just in case there are some misconceptions floating around, loading a syscall filter in libvirt doesn't mean the individual container applications can't also load their own filter. When multiple syscall filters are present for a given process, all of the filters are evaluated and the most restrictive decision for a given syscall request wins. Libvirt has no knowledge of such apps or what rules they might require, so can't make any kind of intelligent decision about syscall filtering for LXC. A perfectly valid point, but I also think of syscall filtering as allowing the host administrator the ability to reduce the attack surface of the host system/kernel from potentially malicious containers/applications without having to rely on these containers/applications to police themselves. I really view seccomp as something that apps use directly themselves, not something that a 3rd party process applies prior to launching the apps, since the latter has far too much administrative burden IMHO. The seccomp filter functionality is definitely something that apps can use themselves, but to limit syscall filtering to just that use case is to miss out on other valid uses as well. As far as the burden is concerned, is users/administrators find it too difficult, there is nothing requiring them to use it, however, for those who are facing serious security risks in their deployments providing syscall filtering in libvirt might be a very welcome addition. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote: Libvirt does not want to be in the business of creating seccomp syscall filters for QEMU. As mentioned before, IMHO that places an unacceptable burden on libvirt to know about the syscalls each a particular version of QEMU requires for its operation. At a high level, I don't see how libvirt configuring and installing a syscall filter is substantially different from libvirt configuring and installing a network filter. Also, and I recognize this is diverting away from a topic most of qemu-devel is not interested in, what about libvirt-lxc? What about all of the other virtualization drivers supported by libvirt (granted, not all would be candidates for syscall filtering, but you get the idea). -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Wednesday, September 18, 2013 05:32:17 PM Daniel P. Berrange wrote: On Wed, Sep 18, 2013 at 12:19:44PM -0400, Paul Moore wrote: On Wednesday, September 18, 2013 04:59:10 PM Daniel P. Berrange wrote: On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote: On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote: Libvirt does not want to be in the business of creating seccomp syscall filters for QEMU. As mentioned before, IMHO that places an unacceptable burden on libvirt to know about the syscalls each a particular version of QEMU requires for its operation. At a high level, I don't see how libvirt configuring and installing a syscall filter is substantially different from libvirt configuring and installing a network filter. The rules created for a network filter have no bearing or relation to internal QEMU implementation details, as you have with syscalls, so this isn't really a relevant comparison. The rules created for a network filter are directly related to the details of the guest running inside of QEMU. From a practical point of view I see both network and syscall filtering as being dependent on the guest; the network filtering configuration can change as the guest's services change, the syscall filtering configuration can change as the QEMU functionality can change. You're talking about two very different things here. Seccomp syscall filtering affects QEMU itself while network filter affects the guest OS apps inside QEMU. From a security standpoint I'm not entirely convinced the distinction is important. Network filtering still does not depend on the implementation details of the guest OS apps - it depends on the services that those apps are using. Once again, I'm not entirely sure that worrying about the distinction between guest apps/services is important - it is just the guest. Thus configuring network filters does not require the admin to have knowledge of the apps internal impl details in the way that seccomp does. Network filters require the admin to have knowledge of what apps/services the guest is providing. Syscall filters require the admin have knowledge of what version of QEMU is deployed on the host. I think it is reasonable to expect that the admin has more knowledge, and more control, over the QEMU version they are using than they do over what is being run in the hosted guests. I don't argue that arriving at the correct syscall filter configuration is more difficult than a network filter, but I don't see what that means we can't offer it as an option for the more savy admins. Also, the libvirt patches I'm currently working on allow the syscall filter to be defined either as a whitelist or a blacklist; the blacklist approach should provide a much more gradual learning curve ... and in the case of containers, I suspect it might also be the better solution. Also, and I recognize this is diverting away from a topic most of qemu-devel is not interested in, what about libvirt-lxc? What about all of the other virtualization drivers supported by libvirt (granted, not all would be candidates for syscall filtering, but you get the idea). It isn't clear to me that syscall filtering is something that's relevant for inclusion in libvirt-lxc. It seems like something that would be used by apps running inside LXC containers directly. For all the same reasons that it makes sense to filter syscalls in QEMU, I think it makes sense to filter syscalls in libvirt-lxc. The fundamental concern is that the kernel presents are large attack surface in the way of syscalls, and it is extremely likely that any given container does not have a legitimate need to call into all of the syscalls the kernel presents to userspace; especially if you consider the recent approaches of using containers to ship/deploy single applications. Also, just in case there are some misconceptions floating around, loading a syscall filter in libvirt doesn't mean the individual container applications can't also load their own filter. When multiple syscall filters are present for a given process, all of the filters are evaluated and the most restrictive decision for a given syscall request wins. Libvirt has no knowledge of such apps or what rules they might require, so can't make any kind of intelligent decision about syscall filtering for LXC. A perfectly valid point, but I also think of syscall filtering as allowing the host administrator the ability to reduce the attack surface of the host system/kernel from potentially malicious containers/applications without having to rely on these containers/applications to police themselves. I really view seccomp as something that apps use directly themselves, not something that a 3rd party process applies prior to launching the apps, since the latter has far too much
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote: On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote: Paul, what exactly are you planning to add to libvirt? I'm not a big fan of using qemu command line to pass syscalls for blacklist as arguments, but I can't see other way to avoid problems (like -net bridge / -net tap) from happening. At present, and as far as I'm concerned pretty much everything is open for discussion, the code works similar to the libvirt network filters. You create a separate XML configuration file which defines the filter and you reference that filter from the domain's XML configuration. When a QEMU/KVM or LXC based domain starts it uses libseccomp to create the seccomp filter and then loads it into the kernel after the fork but before the domain is exec'd. There are no command line arguments passed to QEMU. This work can co-exist with the QEMU seccomp filters without problem. The original goal of this effort wasn't to add libvirt syscall filtering for QEMU, but rather for LXC; adding QEMU support just happened to be a trivial patch once the LXC support was added. (I also apologize for the delays, I hit a snag with an existing problem on libvirt which stopped work and then some other BZs grabbed my attention...) IMHO, if libvirt is enabling seccomp, then making all possible cli args work is a non-goal. If there are things which require privileges seccomp is blocking, then libvirt should avoid using them. eg by making use of FD passing where appropriate to reduce privileges qemu needs. I agree. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist
On Wednesday, September 04, 2013 10:11:10 AM Paul Moore wrote: On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote: This was causing Qemu process to hang when using -sandbox on. Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175 Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com Works for me. Tested-by: Paul Moore pmo...@redhat.com I fear this patch may have been lost in the maintainer discussion - can we merge this fix please? --- qemu-seccomp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 37d38f8..69cee44 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(getuid), 245 }, { SCMP_SYS(geteuid), 245 }, { SCMP_SYS(timer_create), 245 }, +{ SCMP_SYS(times), 245 }, { SCMP_SYS(exit), 245 }, { SCMP_SYS(clock_gettime), 245 }, { SCMP_SYS(time), 245 }, -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist
On Monday, September 09, 2013 12:38:12 PM Paolo Bonzini wrote: Il 06/09/2013 20:41, Eduardo Otubo ha scritto: Hello, Any chance to get this patch applied? Thanks! Paul, perhaps you can add yourself to MAINTAINERS and send a pull request? Paolo Out of respect for the work that Eduardo has done, and is continuing to do, with the QEMU seccomp filtering, I think Eduardo should be the one to take on this role. If Eduardo declines I'll do ahead and submit a patch adding myself to the MAINTAINERS file. On 09/04/2013 11:11 AM, Paul Moore wrote: On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote: This was causing Qemu process to hang when using -sandbox on. Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175 Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com Works for me. Tested-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 37d38f8..69cee44 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(getuid), 245 }, { SCMP_SYS(geteuid), 245 }, { SCMP_SYS(timer_create), 245 }, +{ SCMP_SYS(times), 245 }, { SCMP_SYS(exit), 245 }, { SCMP_SYS(clock_gettime), 245 }, { SCMP_SYS(time), 245 }, -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] seccomp submaintainer? (was Re: [PATCH] seccomp: adding times() to the whitelist)
On Monday, September 09, 2013 03:48:09 PM Paolo Bonzini wrote: Il 09/09/2013 15:20, Eduardo Otubo ha scritto: Out of respect for the work that Eduardo has done, and is continuing to do, with the QEMU seccomp filtering, I think Eduardo should be the one to take on this role. If Eduardo declines I'll do ahead and submit a patch adding myself to the MAINTAINERS file. If this is ok for everyone, I would be really glad to take this role to myself. Paul, thanks for this vote of confidence. Paolo, should I send a patch for MAINTAINERS right away? Ok, I was suggesting Paul because he was the one doing reviews. Eduardo, that is also okay for me. However, even as a maintainer please do wait for Paul's reviews. Many areas of QEMU have maintainers that do not send their own patches without a review, so this wouldn't be a new rule. :) Okay, with respect to maintainership, I was thinking more along the lines of the Linux Kernel where those that do the work get the job; it looks like QEMU has a slightly different twist on the idea. If it makes more sense to the QEMU devs you can always add me as a co-maintainer. Regardless, I do plan on continuing to review/test patches and I don't expect that to change in the near future. Please wait for Anthony's ack. I changed the subject and CCed him to grab his attention. Paolo -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] MAINTAINERS: Add myself to MAINTAINERS file
On Monday, September 09, 2013 02:04:15 PM Eduardo Otubo wrote: Add myself to the MAINTAINERS file. I'll be looking at qemu-seccomp.c and include/sysemu/seccomp.h. Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- As discussed in previous threads, I'm including myself to the MAINTAINERS file so I can take care of the sandbox feature in Qemu. MAINTAINERS |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) Acked-by: Paul Moore pmo...@redhat.com diff --git a/MAINTAINERS b/MAINTAINERS index d128ed0..09c5148 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -766,6 +766,12 @@ M: Blue Swirl blauwir...@gmail.com S: Odd Fixes F: scripts/checkpatch.pl +Seccomp +M: Eduardo Otubo ot...@linux.vnet.ibm.com +S: Supported +F: qemu-seccomp.c +F: include/sysemu/seccomp.h + Usermode Emulation -- BSD user -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist
On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote: This was causing Qemu process to hang when using -sandbox on. Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175 Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com Works for me. Tested-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 37d38f8..69cee44 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(getuid), 245 }, { SCMP_SYS(geteuid), 245 }, { SCMP_SYS(timer_create), 245 }, +{ SCMP_SYS(times), 245 }, { SCMP_SYS(exit), 245 }, { SCMP_SYS(clock_gettime), 245 }, { SCMP_SYS(time), 245 }, -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: adding a second whitelist
On Tuesday, September 03, 2013 02:08:28 PM Corey Bryant wrote: On 09/03/2013 02:02 PM, Corey Bryant wrote: On 08/30/2013 10:21 AM, Eduardo Otubo wrote: On 08/29/2013 05:34 AM, Stefan Hajnoczi wrote: On Wed, Aug 28, 2013 at 10:04:32PM -0300, Eduardo Otubo wrote: Now there's a second whitelist, right before the vcpu starts. The second whitelist is the same as the first one, except for exec() and select(). -netdev tap,downscript=/path/to/script requires exec() in the QEMU shutdown code path. Will this work with seccomp? I actually don't know, but I'll test that as well. Can you run a test with this patch and -netdev? I mean, if you're pointing that out you might have a scenario already setup, right? Thanks! This uses exec() in net/tap.c. I think if we're going to introduce a sandbox environment that restricts existing QEMU behavior, then we have to introduce a new argument to the -sandbox option. So for example, -sandbox on would continue to use the whitelist that allows everything in QEMU to work (or at least it should :). And something like -sandbox on,strict=on would use the whitelist + blacklist. If this is acceptable though, then I wonder how we could go about adding new syscalls to the blacklist in future QEMU releases without regressing -sandbox on,strict=on. Maybe a better approach is to provide support that allows libvirt to define the blacklist and pass it to QEMU? FYI: I didn't want to mention this until I had some patches ready to post, but I'm currently working on adding syscall filtering, via libseccomp, to libvirt. I hope to get an initial RFC-quality patch out soon. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: adding a second whitelist
On Tuesday, September 03, 2013 05:07:53 PM Eduardo Otubo wrote: On 09/03/2013 03:21 PM, Paul Moore wrote: On Tuesday, September 03, 2013 02:08:28 PM Corey Bryant wrote: On 09/03/2013 02:02 PM, Corey Bryant wrote: On 08/30/2013 10:21 AM, Eduardo Otubo wrote: On 08/29/2013 05:34 AM, Stefan Hajnoczi wrote: On Wed, Aug 28, 2013 at 10:04:32PM -0300, Eduardo Otubo wrote: Now there's a second whitelist, right before the vcpu starts. The second whitelist is the same as the first one, except for exec() and select(). -netdev tap,downscript=/path/to/script requires exec() in the QEMU shutdown code path. Will this work with seccomp? I actually don't know, but I'll test that as well. Can you run a test with this patch and -netdev? I mean, if you're pointing that out you might have a scenario already setup, right? Thanks! This uses exec() in net/tap.c. I think if we're going to introduce a sandbox environment that restricts existing QEMU behavior, then we have to introduce a new argument to the -sandbox option. So for example, -sandbox on would continue to use the whitelist that allows everything in QEMU to work (or at least it should :). And something like -sandbox on,strict=on would use the whitelist + blacklist. If this is acceptable though, then I wonder how we could go about adding new syscalls to the blacklist in future QEMU releases without regressing -sandbox on,strict=on. Maybe a better approach is to provide support that allows libvirt to define the blacklist and pass it to QEMU? FYI: I didn't want to mention this until I had some patches ready to post, but I'm currently working on adding syscall filtering, via libseccomp, to libvirt. I hope to get an initial RFC-quality patch out soon. Paul, if you need any help with Qemu and/or testing, please let me know. I would be glad to help :) When you post your RFC to libvirt mailing list please add me as CC. Of course, I appreciate all the help I can get. We can chat a bit more once the patches are posted. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: adding a second whitelist
On Friday, August 30, 2013 05:23:45 PM Stefan Hajnoczi wrote: On Fri, Aug 30, 2013 at 4:21 PM, Eduardo Otubo ot...@linux.vnet.ibm.com wrote: On 08/29/2013 05:34 AM, Stefan Hajnoczi wrote: On Wed, Aug 28, 2013 at 10:04:32PM -0300, Eduardo Otubo wrote: Now there's a second whitelist, right before the vcpu starts. The second whitelist is the same as the first one, except for exec() and select(). -netdev tap,downscript=/path/to/script requires exec() in the QEMU shutdown code path. Will this work with seccomp? I actually don't know, but I'll test that as well. Can you run a test with this patch and -netdev? I mean, if you're pointing that out you might have a scenario already setup, right? I'm not having much luck running qemu.git/master with CONFIG_SECCOMP on Fedora 19. The GTK UI opens but I don't see the guest's display. $ x86_64-softmmu/qemu-system-x86_64 [...GTK UI opens but QEMU is hung...] strace shows the process is hung somehow and ps says it's defunct although it never exited. $ sudo cat /proc/5912/stack [81061fda] do_exit+0x6ca/0xa20 [810ef090] __secure_computing+0xe0/0x240 [8101d722] syscall_trace_enter+0x172/0x230 [816478c8] tracesys+0x7e/0xe2 [] 0x Okay, so seccomp killed the process. $ sudo cat /proc/5912/syscall 29 0x0 0x1000 0x380 0x7fffbeb49380 0x0 0x0 0x7fffbeb495b8 0x7f6b72402657 $ git grep '\29\' arch/x86/include/generated/uapi/asm/unistd_64.h #define __NR_shmget 29 Now it needs syscall 30. I guess the whitelist is only designed for a specific invocation that you are testing? For future reference, it doesn't need to be that hard to identify when seccomp has killed a process. If you're running audit go ahead and check the audit log: # ausearch -m SECCOMP time-Fri Aug 30 11:37:46 2013 type=SECCOMP msg=audit(1377877066.414:64): auid=0 uid=0 gid=0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=3787 comm=20-live-basic_d sig=31 syscall=2 compat=0 ip=0x3a27ae6570 code=0x0 ... and notice the 'syscall' field which in this case happens to be '2'. If you have the 'scmp_sys_resolver' tool installed on your system (libseccomp- devel = 2.1.0 on Fedora) you can then resolve the syscall number: # scmp_sys_resolver 2 open It is also worth mentioning that while scmp_sys_resolver resolves syscalls for the native architecture by default, it can resolve for any of the architectures that libseccomp supports, see the manpage for details (currently: x86, x86_64, x32, and arm). -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: adding a second whitelist
On Friday, August 30, 2013 11:27:28 AM Eduardo Otubo wrote: On 08/29/2013 09:56 AM, Paul Moore wrote: On Wednesday, August 28, 2013 10:04:32 PM Eduardo Otubo wrote: Now there's a second whitelist, right before the vcpu starts. The second whitelist is the same as the first one, except for exec() and select(). Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com We talked about this in a previous thread, but as a reminder, the kernel's seccomp BPF filter works by executing all of the loaded filters for each syscall and taking the least permissive action for all of the results. In other words, if one filter returns ALLOW for a given syscall and another filter returns KILL, the kernel will select the KILL action for the syscall. With that in mind, I think the best option is to keep the existing whitelist and instead of creating a second whitelist, create a second *blacklist* that removes the syscalls you don't want to allow anymore, e.g. exec() and select(). This approach should be easier to maintain and would result in less overhead in the kernel's seccomp evaluator (the blacklist filter would be much smaller than a second whitelist filter). You're correct. I was thinking in a whole other approach, but your point makes a lot more sense. As I mentioned on the IRC, I should call seccomp_init(SCMP_ACT_ALLOW) and seccomp_rule_add(ctx, SCMP_ACT_KILL, list[i].num, 0); is that correct? Yes, just basically swap the actions. Also, as an FYI, while I may be in the IRC room, I typically don't actually monitor the room unless you direct a comment at me (it starts blinking and grabs my attention). -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: adding a second whitelist
On Wednesday, August 28, 2013 10:04:32 PM Eduardo Otubo wrote: Now there's a second whitelist, right before the vcpu starts. The second whitelist is the same as the first one, except for exec() and select(). Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com We talked about this in a previous thread, but as a reminder, the kernel's seccomp BPF filter works by executing all of the loaded filters for each syscall and taking the least permissive action for all of the results. In other words, if one filter returns ALLOW for a given syscall and another filter returns KILL, the kernel will select the KILL action for the syscall. With that in mind, I think the best option is to keep the existing whitelist and instead of creating a second whitelist, create a second *blacklist* that removes the syscalls you don't want to allow anymore, e.g. exec() and select(). This approach should be easier to maintain and would result in less overhead in the kernel's seccomp evaluator (the blacklist filter would be much smaller than a second whitelist filter). -Paul -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: add additional asynchronous I/O syscalls
On Wednesday, July 24, 2013 03:02:23 PM Eduardo Otubo wrote: On 07/23/2013 10:57 AM, Paul Moore wrote: On Monday, July 15, 2013 03:32:01 PM Paul Moore wrote: A previous commit, seccomp: add the asynchronous I/O syscalls to the whitelist, added several asynchronous I/O syscalls but left out the io_submit() and io_cancel() syscalls. This patch corrects this by adding the two missing asynchronous I/O syscalls. Signed-off-by: Paul Moore pmo...@redhat.com A gentle nudge so this fix doesn't get forgotten. Reviewed and tested. Reviewed-by: Eduardo Otubo ot...@linux.vnet.ibm.com Any chance of merging this patch? --- qemu-seccomp.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index ca123bf..173d185 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -33,6 +33,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(socketcall), 250 }, #endif { SCMP_SYS(read), 249 }, +{ SCMP_SYS(io_submit), 249 }, { SCMP_SYS(brk), 248 }, { SCMP_SYS(clone), 247 }, { SCMP_SYS(mmap), 247 }, @@ -231,6 +232,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(recvmmsg), 241 }, { SCMP_SYS(prlimit64), 241 }, { SCMP_SYS(waitid), 241 }, +{ SCMP_SYS(io_cancel), 241 }, { SCMP_SYS(io_setup), 241 }, { SCMP_SYS(io_destroy), 241 } }; -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: add arch_prctl() to the syscall whitelist
On Wednesday, July 24, 2013 03:01:57 PM Eduardo Otubo wrote: On 07/23/2013 10:57 AM, Paul Moore wrote: On Thursday, July 18, 2013 09:57:03 AM Paul Moore wrote: It appears that even a very simple /etc/qemu-ifup configuration can require the arch_prctl() syscall, see the example below: #!/bin/sh /sbin/ifconfig $1 0.0.0.0 up /usr/sbin/brctl addif switch $1 Signed-off-by: Paul Moore pmo...@redhat.com As with the other fix, a gentle nudge so this isn't forgotten. Reviewed and tested. Reviewed-by: Eduardo Otubo ot...@linux.vnet.ibm.com Any chance of merging this patch? --- qemu-seccomp.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 173d185..9e91c73 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -234,7 +234,8 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(waitid), 241 }, { SCMP_SYS(io_cancel), 241 }, { SCMP_SYS(io_setup), 241 }, -{ SCMP_SYS(io_destroy), 241 } +{ SCMP_SYS(io_destroy), 241 }, +{ SCMP_SYS(arch_prctl), 240 } }; int seccomp_start(void) -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: add additional asynchronous I/O syscalls
On Monday, July 15, 2013 03:32:01 PM Paul Moore wrote: A previous commit, seccomp: add the asynchronous I/O syscalls to the whitelist, added several asynchronous I/O syscalls but left out the io_submit() and io_cancel() syscalls. This patch corrects this by adding the two missing asynchronous I/O syscalls. Signed-off-by: Paul Moore pmo...@redhat.com A gentle nudge so this fix doesn't get forgotten. --- qemu-seccomp.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index ca123bf..173d185 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -33,6 +33,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(socketcall), 250 }, #endif { SCMP_SYS(read), 249 }, +{ SCMP_SYS(io_submit), 249 }, { SCMP_SYS(brk), 248 }, { SCMP_SYS(clone), 247 }, { SCMP_SYS(mmap), 247 }, @@ -231,6 +232,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(recvmmsg), 241 }, { SCMP_SYS(prlimit64), 241 }, { SCMP_SYS(waitid), 241 }, +{ SCMP_SYS(io_cancel), 241 }, { SCMP_SYS(io_setup), 241 }, { SCMP_SYS(io_destroy), 241 } }; -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: add arch_prctl() to the syscall whitelist
On Thursday, July 18, 2013 09:57:03 AM Paul Moore wrote: It appears that even a very simple /etc/qemu-ifup configuration can require the arch_prctl() syscall, see the example below: #!/bin/sh /sbin/ifconfig $1 0.0.0.0 up /usr/sbin/brctl addif switch $1 Signed-off-by: Paul Moore pmo...@redhat.com As with the other fix, a gentle nudge so this isn't forgotten. --- qemu-seccomp.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 173d185..9e91c73 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -234,7 +234,8 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(waitid), 241 }, { SCMP_SYS(io_cancel), 241 }, { SCMP_SYS(io_setup), 241 }, -{ SCMP_SYS(io_destroy), 241 } +{ SCMP_SYS(io_destroy), 241 }, +{ SCMP_SYS(arch_prctl), 240 } }; int seccomp_start(void) -- paul moore security and virtualization @ redhat
[Qemu-devel] [PATCH] seccomp: add arch_prctl() to the syscall whitelist
It appears that even a very simple /etc/qemu-ifup configuration can require the arch_prctl() syscall, see the example below: #!/bin/sh /sbin/ifconfig $1 0.0.0.0 up /usr/sbin/brctl addif switch $1 Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 173d185..9e91c73 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -234,7 +234,8 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(waitid), 241 }, { SCMP_SYS(io_cancel), 241 }, { SCMP_SYS(io_setup), 241 }, -{ SCMP_SYS(io_destroy), 241 } +{ SCMP_SYS(io_destroy), 241 }, +{ SCMP_SYS(arch_prctl), 240 } }; int seccomp_start(void)
Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6
On Thursday, July 18, 2013 06:37:15 PM Paolo Bonzini wrote: Il 18/07/2013 18:35, Eduardo Otubo ha scritto: On 07/18/2013 01:28 PM, Anthony Liguori wrote: Eduardo Otubo ot...@linux.vnet.ibm.com writes: Hello all, In this small patch series I basically: Cover letter should be marked [PATCH 0/2]. Otherwise it defeats filtering. Would like to see a Reviewed-by from someone before applying this. I'm running some tests with qemu xen, I'll post a v3 by the end of the day. I'll format the cover letter in the correct way next time. I feel that, at some point, grep and code review must trump experiments... Paul, how did you guys handle this in other projects? To the best of my knowledge QEMU currently stands alone with its complexity and use of seccomp filtering. There are other applications, but they are either of the syscall sandboxing type where the users define the filters, or the rigid, smaller, well defined filter type. QEMU is both large and has a huge number of options which affect the syscalls used. At some point it would be nice to develop a mechanism to do some static analysis on a binary and its associated libraries to come up with a worst case filter (worst case because you might not want all the syscalls that a library uses, e.g. glibc). Unfortunately, we don't have such a tool the moment - it's hard enough generating correct filters with a nice architecture agnostic manner :) On the plus side, I think libseccomp is very close to being pretty much feature complete (excluding new architectures that may pop up, at present we are only x86, x86_64, x32, and ARM) so I'll be able to start turning some effort towards better tools and patches for existing applications. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6
On Thursday, July 18, 2013 08:48:10 PM Peter Maydell wrote: On 18 July 2013 20:39, Paul Moore pmo...@redhat.com wrote: On the plus side, I think libseccomp is very close to being pretty much feature complete (excluding new architectures that may pop up, at present we are only x86, x86_64, x32, and ARM) ...AArch64 ? :-) Not yet, just 32-bit ARM EABI. If you've got a working system and are willing to so some hacking or run some tests we could work on it for a future libseccomp release. An emulated AArch64 VM would also work, but that route can be slow/annoying. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6
On Thursday, July 18, 2013 10:31:46 PM Peter Maydell wrote: On 18 July 2013 21:05, Paul Moore pmo...@redhat.com wrote: On Thursday, July 18, 2013 08:48:10 PM Peter Maydell wrote: On 18 July 2013 20:39, Paul Moore pmo...@redhat.com wrote: On the plus side, I think libseccomp is very close to being pretty much feature complete (excluding new architectures that may pop up, at present we are only x86, x86_64, x32, and ARM) ...AArch64 ? :-) Not yet, just 32-bit ARM EABI. If you've got a working system and are willing to so some hacking or run some tests we could work on it for a future libseccomp release. An emulated AArch64 VM would also work, but that route can be slow/annoying. Simulators are all we have right now (we're juuust getting to the point where hardware is starting to become available). I wasn't being serious really, though I'm sure somebody (possibly even somebody at Red Hat :-)) will work around to it at some point. Regardless, consider it a standing offer. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH 1/2] seccomp: no need to check arch in syscall whitelist
On Monday, July 15, 2013 02:29:37 PM Eduardo Otubo wrote: Since libseccomp 2.0 there's no need to check the architecture type anymore. Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- qemu-seccomp.c | 13 - 1 file changed, 13 deletions(-) Good, this should make long term maintenance easier. However, you should probably update the configure script to require libseccomp-2.0.0 or greater. Actually, since this is 1.6 material, I would make it dependent on libseccomp-2.1.0 as there are a number of improvements in that release and it has been out for a while now. If you're feeling particularly adventurous, you could even enable the QEMU seccomp code for x32/ARM hosts with libseccomp-2.1.0 hosts :) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index ca123bf..1d5fd71 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -26,12 +26,9 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(timer_gettime), 254 }, { SCMP_SYS(futex), 253 }, { SCMP_SYS(select), 252 }, -#if defined(__x86_64__) { SCMP_SYS(recvfrom), 251 }, { SCMP_SYS(sendto), 250 }, -#elif defined(__i386__) { SCMP_SYS(socketcall), 250 }, -#endif { SCMP_SYS(read), 249 }, { SCMP_SYS(brk), 248 }, { SCMP_SYS(clone), 247 }, @@ -40,7 +37,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(execve), 245 }, { SCMP_SYS(open), 245 }, { SCMP_SYS(ioctl), 245 }, -#if defined(__x86_64__) { SCMP_SYS(socket), 245 }, { SCMP_SYS(setsockopt), 245 }, { SCMP_SYS(recvmsg), 245 }, @@ -51,9 +47,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(bind), 245 }, { SCMP_SYS(listen), 245 }, { SCMP_SYS(semget), 245 }, -#elif defined(__i386__) { SCMP_SYS(ipc), 245 }, -#endif { SCMP_SYS(gettimeofday), 245 }, { SCMP_SYS(readlink), 245 }, { SCMP_SYS(access), 245 }, @@ -64,7 +58,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(statfs), 245 }, { SCMP_SYS(unlink), 245 }, { SCMP_SYS(wait4), 245 }, -#if defined(__i386__) { SCMP_SYS(fcntl64), 245 }, { SCMP_SYS(fstat64), 245 }, { SCMP_SYS(stat64), 245 }, @@ -77,7 +70,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(_llseek), 245 }, { SCMP_SYS(mmap2), 245 }, { SCMP_SYS(sigprocmask), 245 }, -#endif { SCMP_SYS(sched_getparam), 245 }, { SCMP_SYS(sched_getscheduler), 245 }, { SCMP_SYS(fstat), 245 }, @@ -145,9 +137,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(epoll_create), 242 }, { SCMP_SYS(epoll_ctl), 242 }, { SCMP_SYS(epoll_wait), 242 }, -#if defined(__i386__) { SCMP_SYS(waitpid), 242 }, -#elif defined(__x86_64__) { SCMP_SYS(getsockname), 242 }, { SCMP_SYS(getpeername), 242 }, { SCMP_SYS(accept4), 242 }, @@ -159,7 +149,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(semtimedop), 241 }, { SCMP_SYS(epoll_ctl_old), 241 }, { SCMP_SYS(epoll_wait_old), 241 }, -#endif { SCMP_SYS(epoll_pwait), 241 }, { SCMP_SYS(epoll_create1), 241 }, { SCMP_SYS(ppoll), 241 }, @@ -174,7 +163,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(getresuid), 241 }, { SCMP_SYS(getresgid), 241 }, { SCMP_SYS(getgroups), 241 }, -#if defined(__i386__) { SCMP_SYS(getresuid32), 241 }, { SCMP_SYS(getresgid32), 241 }, { SCMP_SYS(getgroups32), 241 }, @@ -193,7 +181,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(lstat64), 241 }, { SCMP_SYS(sendfile64), 241 }, { SCMP_SYS(ugetrlimit), 241 }, -#endif { SCMP_SYS(alarm), 241 }, { SCMP_SYS(rt_sigsuspend), 241 }, { SCMP_SYS(rt_sigqueueinfo), 241 }, -- paul moore security and virtualization @ redhat
[Qemu-devel] [PATCH] seccomp: add additional asynchronous I/O syscalls
A previous commit, seccomp: add the asynchronous I/O syscalls to the whitelist, added several asynchronous I/O syscalls but left out the io_submit() and io_cancel() syscalls. This patch corrects this by adding the two missing asynchronous I/O syscalls. Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index ca123bf..173d185 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -33,6 +33,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(socketcall), 250 }, #endif { SCMP_SYS(read), 249 }, +{ SCMP_SYS(io_submit), 249 }, { SCMP_SYS(brk), 248 }, { SCMP_SYS(clone), 247 }, { SCMP_SYS(mmap), 247 }, @@ -231,6 +232,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(recvmmsg), 241 }, { SCMP_SYS(prlimit64), 241 }, { SCMP_SYS(waitid), 241 }, +{ SCMP_SYS(io_cancel), 241 }, { SCMP_SYS(io_setup), 241 }, { SCMP_SYS(io_destroy), 241 } };
Re: [Qemu-devel] [PATCH] seccomp: add the asynchronous I/O syscalls to the whitelist
On Wednesday, May 29, 2013 04:30:01 PM Paul Moore wrote: In order to enable the asynchronous I/O functionality when using the seccomp sandbox we need to add the associated syscalls to the whitelist. Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 031da1d..ca123bf 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -87,6 +87,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(stat), 245 }, { SCMP_SYS(uname), 245 }, { SCMP_SYS(eventfd2), 245 }, +{ SCMP_SYS(io_getevents), 245 }, { SCMP_SYS(dup), 245 }, { SCMP_SYS(dup2), 245 }, { SCMP_SYS(dup3), 245 }, @@ -229,7 +230,9 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(sendmmsg), 241 }, { SCMP_SYS(recvmmsg), 241 }, { SCMP_SYS(prlimit64), 241 }, -{ SCMP_SYS(waitid), 241 } +{ SCMP_SYS(waitid), 241 }, +{ SCMP_SYS(io_setup), 241 }, +{ SCMP_SYS(io_destroy), 241 } }; int seccomp_start(void) Any reason this patch wasn't pulled in for 1.5.1? -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] seccomp: add the asynchronous I/O syscalls to the whitelist
On Wednesday, July 10, 2013 10:02:55 PM Andreas Färber wrote: Am 10.07.2013 16:31, schrieb Paul Moore: On Wednesday, May 29, 2013 04:30:01 PM Paul Moore wrote: In order to enable the asynchronous I/O functionality when using the seccomp sandbox we need to add the associated syscalls to the whitelist. Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 031da1d..ca123bf 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -87,6 +87,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(stat), 245 }, { SCMP_SYS(uname), 245 }, { SCMP_SYS(eventfd2), 245 }, +{ SCMP_SYS(io_getevents), 245 }, { SCMP_SYS(dup), 245 }, { SCMP_SYS(dup2), 245 }, { SCMP_SYS(dup3), 245 }, @@ -229,7 +230,9 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(sendmmsg), 241 }, { SCMP_SYS(recvmmsg), 241 }, { SCMP_SYS(prlimit64), 241 }, -{ SCMP_SYS(waitid), 241 } +{ SCMP_SYS(waitid), 241 }, +{ SCMP_SYS(io_setup), 241 }, +{ SCMP_SYS(io_destroy), 241 } }; int seccomp_start(void) Any reason this patch wasn't pulled in for 1.5.1? Yes: You forget to put a line Cc: qemu-sta...@nongnu.org into the commit message nor was it ever CC'ed while on the list. ;) You learn something new everyday, thanks. Can I assume this will make it into 1.5.2 now? -- paul moore security and virtualization @ redhat
[Qemu-devel] [PATCH] seccomp: add the asynchronous I/O syscalls to the whitelist
In order to enable the asynchronous I/O functionality when using the seccomp sandbox we need to add the associated syscalls to the whitelist. Signed-off-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 031da1d..ca123bf 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -87,6 +87,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(stat), 245 }, { SCMP_SYS(uname), 245 }, { SCMP_SYS(eventfd2), 245 }, +{ SCMP_SYS(io_getevents), 245 }, { SCMP_SYS(dup), 245 }, { SCMP_SYS(dup2), 245 }, { SCMP_SYS(dup3), 245 }, @@ -229,7 +230,9 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(sendmmsg), 241 }, { SCMP_SYS(recvmmsg), 241 }, { SCMP_SYS(prlimit64), 241 }, -{ SCMP_SYS(waitid), 241 } +{ SCMP_SYS(waitid), 241 }, +{ SCMP_SYS(io_setup), 241 }, +{ SCMP_SYS(io_destroy), 241 } }; int seccomp_start(void)
Re: [Qemu-devel] [RFC] Continuous work on sandboxing
On Tuesday, April 30, 2013 04:28:54 PM Corey Bryant wrote: Just to be clear, I'm thinking you could launch guests in one of two different seccomp sandboxed environments: 1) Using the existing and more permissive whitelist where every QEMU feature works: qemu-kvm -sandbox on,default In general, I like the comma delimited list of sandbox filters/methods/etc. but I'm not sure we need to explicitly specify default, it seems like on would be sufficient. It also preserved compatibility with what we have now. 2) A more restricted whitelist environment that doesn't allow all QEMU features to work. It would be limited to the whitelist in 1 and it would also deny things like execve(), open(), socket(), certain ioctl() parameters, and may only allow reads/writes to specifc fds, and/or block anything else that could be dangerous: qemu-kvm -sandbox on,restricted I'm just throwing these command line options and syscalls out there. And maybe it makes more sense for libvirt to pass the syscalls and parameters to QEMU so that libvirt can determine the parameters to restrict, like fd's the guest is allowed to read/write. Here's another thread where this was discussed: http://www.redhat.com/archives/libvir-list/2013-April/msg01501.html Seems reasonable to me. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [RFC] Continuous work on sandboxing
On Monday, April 29, 2013 05:52:10 PM Corey Bryant wrote: On 04/26/2013 05:07 PM, Paul Moore wrote: [snip] 3. Debugging and/or learning mode - third party libraries still have the problem of interfering in the Qemu's signal mask. According to some previous discussions, perhaps patch all external libraries that mass up with this mask (spice, for example) is a way to solve it. But not sure if it worth the time spent. Would like to hear you guys. I think patching all the libraries is a losing battle, I think we need to pursue alternate debugging techniques. I agree. It would be nice to have some sort of learning mode that reported all denied syscalls on a single run, but signal handlers doesn't seem like the right way. Maybe we could improve on this approach, since it never gained traction: https://lkml.org/lkml/2013/1/7/313 At least we can get a single denied syscall at a time today via the audit log that the kernel issues. Eduardo, you may want to see if there's a good place to document that for QEMU so that people know where to look. Lately I've been using the fact that the seccomp BPF filter result generates an audit log; it either dumps to syslog or the audit log (depending on your configuration) and seems to accomplish most of what we wanted with SECCOMP_RET_INFO. I'm always open to new/better ideas, but this has been working reasonably well for me for the past few months. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [RFC] Continuous work on sandboxing
On Monday, April 29, 2013 03:39:57 PM Eduardo Otubo wrote: On 04/26/2013 06:07 PM, Paul Moore wrote: On Friday, April 26, 2013 03:39:33 PM Eduardo Otubo wrote: Also, looking a bit further ahead, it might be interesting to look at removing some of the arch dependent stuff in qemu-seccomp.c. The latest version of libseccomp should remove the need for many, if not all, of the arch specific #ifdefs and the next version of libseccomp will add support for x32 and ARM. Tell me more about this. You're saying I can remove the #ifdefs and keep the lines like { SCMP_SYS(getresuid32), 241 }, or address these syscalls in another way? Yes. If you are using libseccomp 2.x you shouldn't need any of the #ifdefs in the seccomp_whitelist[] variable like there are at present. As long as you aren't using the *_exact() versions of the libseccomp APIs, which QEMU is not, the library will do the right thing for the current architecture: syscalls that don't exist will be ignored, those that need translation, e.g. socket() on x86, will be translated, and those that do exist normally will be handled normally. Anything else I would consider a bug in libseccomp. There is more to it if you are generating a seccomp filter to support multiple simultaneous architectures, e.g. x86 and x86_64, but that doesn't really apply with QEMU. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [RFC] Continuous work on sandboxing
On Friday, April 26, 2013 03:39:33 PM Eduardo Otubo wrote: Hello folks, Resuming the sandboxing work, I'd like to ask for comments on the ideias I have: 1. Reduce whitelist to the optimal subset: Run various tests on Qemu with different configurations to reduce to the smallest syscall set possible; test and send a patch weekly (this is already being performed and a patch is on the way) Is this hooked into a testing framework? While it is always nice to have someone verify the correctness, having a simple tool/testsuite what can run through things on a regular basis is even better. Also, looking a bit further ahead, it might be interesting to look at removing some of the arch dependent stuff in qemu-seccomp.c. The latest version of libseccomp should remove the need for many, if not all, of the arch specific #ifdefs and the next version of libseccomp will add support for x32 and ARM. 2. Introduce a second whitelist - the whitelist should be defined in libvirt and passed on to qemu or just pre defined in Qemu? Also remove execve() and avoid open() and socket() and its parameters ... If I'm understanding you correctly, I think what you'll want is a second *blacklist*. We talked about this previously; we currently have a single whitelist, and considering how seccomp works, you can really only further restrict things after you install a whitelist into the kernel (hence the blacklist). 3. Debugging and/or learning mode - third party libraries still have the problem of interfering in the Qemu's signal mask. According to some previous discussions, perhaps patch all external libraries that mass up with this mask (spice, for example) is a way to solve it. But not sure if it worth the time spent. Would like to hear you guys. I think patching all the libraries is a losing battle, I think we need to pursue alternate debugging techniques. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv5 1.3] seccomp: adding new syscalls (bugzilla 855162)
On Thursday, November 29, 2012 01:56:41 PM Eduardo Otubo wrote: According to the bug 855162[0] - there's the need of adding new syscalls to the whitelist when using Qemu with Libvirt. [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162 Reported-by: Paul Moore pmo...@redhat.com Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com Works for me on 64-bit x86 F17. Tested-by: Paul Moore pmo...@redhat.com diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 64329a3..2a71d6f 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -26,8 +26,12 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(timer_gettime), 254 }, { SCMP_SYS(futex), 253 }, { SCMP_SYS(select), 252 }, +#if defined(__x86_64__) { SCMP_SYS(recvfrom), 251 }, { SCMP_SYS(sendto), 250 }, +#elif defined(__i386__) +{ SCMP_SYS(socketcall), 250 }, +#endif { SCMP_SYS(read), 249 }, { SCMP_SYS(brk), 248 }, { SCMP_SYS(clone), 247 }, @@ -36,15 +40,30 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(execve), 245 }, { SCMP_SYS(open), 245 }, { SCMP_SYS(ioctl), 245 }, +#if defined(__x86_64__) +{ SCMP_SYS(socket), 245 }, +{ SCMP_SYS(setsockopt), 245 }, { SCMP_SYS(recvmsg), 245 }, { SCMP_SYS(sendmsg), 245 }, { SCMP_SYS(accept), 245 }, { SCMP_SYS(connect), 245 }, +{ SCMP_SYS(socketpair), 245 }, +{ SCMP_SYS(bind), 245 }, +{ SCMP_SYS(listen), 245 }, +{ SCMP_SYS(semget), 245 }, +#elif defined(__i386__) +{ SCMP_SYS(ipc), 245 }, +#endif { SCMP_SYS(gettimeofday), 245 }, { SCMP_SYS(readlink), 245 }, { SCMP_SYS(access), 245 }, { SCMP_SYS(prctl), 245 }, { SCMP_SYS(signalfd), 245 }, +{ SCMP_SYS(getrlimit), 245 }, +{ SCMP_SYS(set_tid_address), 245 }, +{ SCMP_SYS(statfs), 245 }, +{ SCMP_SYS(unlink), 245 }, +{ SCMP_SYS(wait4), 245 }, #if defined(__i386__) { SCMP_SYS(fcntl64), 245 }, { SCMP_SYS(fstat64), 245 }, @@ -56,30 +75,33 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(sigreturn), 245 }, { SCMP_SYS(_newselect), 245 }, { SCMP_SYS(_llseek), 245 }, -{ SCMP_SYS(mmap2), 245}, +{ SCMP_SYS(mmap2), 245 }, { SCMP_SYS(sigprocmask), 245 }, -#elif defined(__x86_64__) -{ SCMP_SYS(sched_getparam), 245}, -{ SCMP_SYS(sched_getscheduler), 245}, -{ SCMP_SYS(fstat), 245}, -{ SCMP_SYS(clock_getres), 245}, -{ SCMP_SYS(sched_get_priority_min), 245}, -{ SCMP_SYS(sched_get_priority_max), 245}, -{ SCMP_SYS(stat), 245}, -{ SCMP_SYS(socket), 245}, -{ SCMP_SYS(setsockopt), 245}, -{ SCMP_SYS(uname), 245}, -{ SCMP_SYS(semget), 245}, #endif +{ SCMP_SYS(sched_getparam), 245 }, +{ SCMP_SYS(sched_getscheduler), 245 }, +{ SCMP_SYS(fstat), 245 }, +{ SCMP_SYS(clock_getres), 245 }, +{ SCMP_SYS(sched_get_priority_min), 245 }, +{ SCMP_SYS(sched_get_priority_max), 245 }, +{ SCMP_SYS(stat), 245 }, +{ SCMP_SYS(uname), 245 }, { SCMP_SYS(eventfd2), 245 }, { SCMP_SYS(dup), 245 }, +{ SCMP_SYS(dup2), 245 }, +{ SCMP_SYS(dup3), 245 }, { SCMP_SYS(gettid), 245 }, +{ SCMP_SYS(getgid), 245 }, +{ SCMP_SYS(getegid), 245 }, +{ SCMP_SYS(getuid), 245 }, +{ SCMP_SYS(geteuid), 245 }, { SCMP_SYS(timer_create), 245 }, { SCMP_SYS(exit), 245 }, { SCMP_SYS(clock_gettime), 245 }, { SCMP_SYS(time), 245 }, { SCMP_SYS(restart_syscall), 245 }, { SCMP_SYS(pwrite64), 245 }, +{ SCMP_SYS(nanosleep), 245 }, { SCMP_SYS(chown), 245 }, { SCMP_SYS(openat), 245 }, { SCMP_SYS(getdents), 245 }, @@ -93,8 +115,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(lseek), 245 }, { SCMP_SYS(pselect6), 245 }, { SCMP_SYS(fork), 245 }, -{ SCMP_SYS(bind), 245 }, -{ SCMP_SYS(listen), 245 }, { SCMP_SYS(eventfd), 245 }, { SCMP_SYS(rt_sigprocmask), 245 }, { SCMP_SYS(write), 244 }, @@ -104,10 +124,112 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(pipe2), 242 }, { SCMP_SYS(munmap), 242 }, { SCMP_SYS(mremap), 242 }, +{ SCMP_SYS(fdatasync), 242 }, +{ SCMP_SYS(close), 242 }, +{ SCMP_SYS(rt_sigpending), 242 }, +{ SCMP_SYS(rt_sigtimedwait), 242 }, +{ SCMP_SYS(readv), 242 }, +{ SCMP_SYS(writev), 242 }, +{ SCMP_SYS(preadv), 242 }, +{ SCMP_SYS(pwritev), 242 }, +{ SCMP_SYS(setrlimit), 242 }, +{ SCMP_SYS(ftruncate), 242 }, +{ SCMP_SYS(lstat), 242 }, +{ SCMP_SYS(pipe), 242 }, +{ SCMP_SYS(umask), 242 }, +{ SCMP_SYS(chdir), 242 }, +{ SCMP_SYS(setitimer), 242 }, +{ SCMP_SYS(setsid), 242 }, +{ SCMP_SYS(poll), 242 }, +{ SCMP_SYS(epoll_create), 242 }, +{ SCMP_SYS(epoll_ctl), 242 }, +{ SCMP_SYS(epoll_wait
Re: [Qemu-devel] [PATCHv3 1/5] seccomp: adding new syscalls (bugzilla 855162)
On Tuesday, November 27, 2012 11:11:32 AM Corey Bryant wrote: Thanks for the additional details. They were very useful. I was able to reproduce this when I manually built spice release 0.10.1, but not with the Fedora 0.10.1 package. One difference I noticed is that the Fedora version wasn't logging info messages. Nonetheless, we'll send new patches soon. It looks like the following were missing: epoll_create, epoll_wait, and epoll_ctl Great, glad to hear my test system wasn't just being stubborn :) I'll re-test as soon as I see the patches. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv3 1/5] seccomp: adding new syscalls (bugzilla 855162)
On Monday, November 26, 2012 11:41:06 AM Corey Bryant wrote: On 11/21/2012 10:24 AM, Paul Moore wrote: On Wednesday, November 21, 2012 11:20:44 AM Eduardo Otubo wrote: Hello folks, Does anyone had a chance to take a look at this? We would like to get this into the 1.3 release. Thanks again :) I way a bit delayed due to travel, but I started playing with it a bit yesterday afternoon and unfortunately it still doesn't work for me (using the same test/reproducer I documented in the RH BZ). I've tried running QEMU both via libvirt and the command line (using a libvirt derived command line). I'm applying the patches to the F17 QEMU 1.2 package; there is some minor fixup needed in the configure script but nothing major. What is further frustrating is that the debug code (patch 5/5) doesn't seem to output the problematic syscall. I wanted to investigate this a bit more before responding, but with the holiday approaching (Thanksgiving in the US), I'm not sure how much progress I'll be able to make for the remainder of this week. Sorry about that. If you have any further questions about how, or what, I'm testing, just ask. Paul, Is your host 32 or 64-bit? 64-bit -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv3 1/5] seccomp: adding new syscalls (bugzilla 855162)
On Monday, November 26, 2012 02:59:21 PM Corey Bryant wrote: On 11/26/2012 12:08 PM, Paul Moore wrote: On Monday, November 26, 2012 11:41:06 AM Corey Bryant wrote: On 11/21/2012 10:24 AM, Paul Moore wrote: On Wednesday, November 21, 2012 11:20:44 AM Eduardo Otubo wrote: Hello folks, Does anyone had a chance to take a look at this? We would like to get this into the 1.3 release. Thanks again :) I way a bit delayed due to travel, but I started playing with it a bit yesterday afternoon and unfortunately it still doesn't work for me (using the same test/reproducer I documented in the RH BZ). I've tried running QEMU both via libvirt and the command line (using a libvirt derived command line). I'm applying the patches to the F17 QEMU 1.2 package; there is some minor fixup needed in the configure script but nothing major. What is further frustrating is that the debug code (patch 5/5) doesn't seem to output the problematic syscall. I wanted to investigate this a bit more before responding, but with the holiday approaching (Thanksgiving in the US), I'm not sure how much progress I'll be able to make for the remainder of this week. Sorry about that. If you have any further questions about how, or what, I'm testing, just ask. Paul, Is your host 32 or 64-bit? 64-bit I'm having trouble recreating this. I'm running a Fedora 17 64-bit host and a Fedora 17 64-bit guest with domain XML that mirrors yours. Here's the domain XML I'm using and the resulting QEMU command line: Domain XML: http://pastebin.com/DWa4RQ1Y Command line: http://pastebin.com/2QTWsUhP I'm running with QEMU commit 8db972cfa469b4e4afd9c65e54e796b83b5ce3a2 which is 1.2.0 with: (a) just the first patch applied, as well as with (b) all of this patch series applied. Any thoughts on what could be different? Like I said earlier, I'm running with the F17 QEMU 1.2 package, qemu-1.2.0-16.fc18 to be exact, with Eduardo's patches applied on top. I'm currently testing another set of interim patches from Eduardo that was sent off-list for testing (you were CC'd); hopefully that will resolve the problem. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv3 1/5] seccomp: adding new syscalls (bugzilla 855162)
On Monday, November 26, 2012 03:41:00 PM Paul Moore wrote: On Monday, November 26, 2012 02:59:21 PM Corey Bryant wrote: On 11/26/2012 12:08 PM, Paul Moore wrote: On Monday, November 26, 2012 11:41:06 AM Corey Bryant wrote: On 11/21/2012 10:24 AM, Paul Moore wrote: On Wednesday, November 21, 2012 11:20:44 AM Eduardo Otubo wrote: Hello folks, Does anyone had a chance to take a look at this? We would like to get this into the 1.3 release. Thanks again :) I way a bit delayed due to travel, but I started playing with it a bit yesterday afternoon and unfortunately it still doesn't work for me (using the same test/reproducer I documented in the RH BZ). I've tried running QEMU both via libvirt and the command line (using a libvirt derived command line). I'm applying the patches to the F17 QEMU 1.2 package; there is some minor fixup needed in the configure script but nothing major. What is further frustrating is that the debug code (patch 5/5) doesn't seem to output the problematic syscall. I wanted to investigate this a bit more before responding, but with the holiday approaching (Thanksgiving in the US), I'm not sure how much progress I'll be able to make for the remainder of this week. Sorry about that. If you have any further questions about how, or what, I'm testing, just ask. Paul, Is your host 32 or 64-bit? 64-bit I'm having trouble recreating this. I'm running a Fedora 17 64-bit host and a Fedora 17 64-bit guest with domain XML that mirrors yours. Here's the domain XML I'm using and the resulting QEMU command line: Domain XML: http://pastebin.com/DWa4RQ1Y Command line: http://pastebin.com/2QTWsUhP I'm running with QEMU commit 8db972cfa469b4e4afd9c65e54e796b83b5ce3a2 which is 1.2.0 with: (a) just the first patch applied, as well as with (b) all of this patch series applied. Any thoughts on what could be different? Like I said earlier, I'm running with the F17 QEMU 1.2 package, qemu-1.2.0-16.fc18 to be exact, with Eduardo's patches applied on top. I'm currently testing another set of interim patches from Eduardo that was sent off-list for testing (you were CC'd); hopefully that will resolve the problem. Unfortunately, the latest patches from Eduardo met with the same fate. Here is more detailed information on my system (HP DL160 G5, F17, 64-bit): # uname -r 3.6.7-4.fc17.x86_64 [NOTE: standard F17 kernel] # rpm -qa | grep qemu qemu-kvm-tools-1.2.0-16.pm5.fc17.x86_64 qemu-common-1.2.0-16.pm5.fc17.x86_64 qemu-kvm-1.2.0-16.pm5.fc17.x86_64 ipxe-roms-qemu-20120328-1.gitaac9718.fc17.noarch qemu-img-1.2.0-16.pm5.fc17.x86_64 qemu-system-x86-1.2.0-16.pm5.fc17.x86_64 [NOTE: the 'pm5' is my designation indicating the patched version] # ./qemu_seccomp.sh -sandbox off char device redirected to /dev/pts/0 do_spice_init: starting 0.10.1 spice_server_add_interface: SPICE_INTERFACE_MIGRATION spice_server_add_interface: SPICE_INTERFACE_KEYBOARD spice_server_add_interface: SPICE_INTERFACE_MOUSE spice_server_add_interface: SPICE_INTERFACE_QXL red_worker_main: begin display_channel_create: create display channel cursor_channel_create: create cursor channel spice_server_add_interface: SPICE_INTERFACE_PLAYBACK spice_server_add_interface: SPICE_INTERFACE_RECORD [NOTE: I hit Ctrl-C at this point] qemu: terminating on signal 2 spice_server_remove_interface: remove SPICE_INTERFACE_PLAYBACK spice_server_remove_interface: remove SPICE_INTERFACE_RECORD # ./qemu_seccomp.sh -sandbox on char device redirected to /dev/pts/0 do_spice_init: starting 0.10.1 spice_server_add_interface: SPICE_INTERFACE_MIGRATION spice_server_add_interface: SPICE_INTERFACE_KEYBOARD spice_server_add_interface: SPICE_INTERFACE_MOUSE spice_server_add_interface: SPICE_INTERFACE_QXL red_worker_main: begin ./qemu_seccomp.sh: line 28: 21085 Bad system call /usr/bin/qemu-kvm -S -M pc-0.14 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -name f16- test-1 -uuid 13c7da9b-a79a-0688-267a-8206136bc8d6 -nodefconfig -nodefaults - chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f16- test-1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device virtio-serial-pci,id=virtio- serial0,bus=pci.0,addr=0x5 -device piix3-usb- uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/f16- test-1.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk- pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio- disk0,bootindex=1 -netdev user,id=hostnet0 -device virtio-net- pci,netdev=hostnet0,id=net0,mac=52:54:00:9a:9d:63,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio- serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 - device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1
Re: [Qemu-devel] [PATCHv3 1/5] seccomp: adding new syscalls (bugzilla 855162)
On Wednesday, November 21, 2012 11:20:44 AM Eduardo Otubo wrote: Hello folks, Does anyone had a chance to take a look at this? We would like to get this into the 1.3 release. Thanks again :) I way a bit delayed due to travel, but I started playing with it a bit yesterday afternoon and unfortunately it still doesn't work for me (using the same test/reproducer I documented in the RH BZ). I've tried running QEMU both via libvirt and the command line (using a libvirt derived command line). I'm applying the patches to the F17 QEMU 1.2 package; there is some minor fixup needed in the configure script but nothing major. What is further frustrating is that the debug code (patch 5/5) doesn't seem to output the problematic syscall. I wanted to investigate this a bit more before responding, but with the holiday approaching (Thanksgiving in the US), I'm not sure how much progress I'll be able to make for the remainder of this week. Sorry about that. If you have any further questions about how, or what, I'm testing, just ask. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 3/4] Support for double whitelist filters
On Monday, November 05, 2012 09:39:46 AM Corey Bryant wrote: On 11/02/2012 06:14 PM, Paul Moore wrote: On Friday, November 02, 2012 06:00:29 PM Corey Bryant wrote: On 11/02/2012 05:29 PM, Paul Moore wrote: On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote: This patch includes a second whitelist right before the main loop. It's a smaller and more restricted whitelist, excluding execve() among many others. v2: * ctx changed to main_loop_ctx * seccomp_on now inside ifdef * open syscall added to the main_loop whitelist Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com Unfortunately qemu.org seems to be down for me today so I can't grab the latest repo to review/verify this patch (some of my comments/assumptions below may be off) but I'm a little confused, hopefully you guys can help me out, read below ... The first call to seccomp_install_filter() will setup a whitelist for the syscalls that have been explicitly specified, all others will hit the default action TRAP/KILL. The second call to seccomp_install_filter() will add a second whitelist for another set of explicitly specified syscalls, all others will hit the default action TRAP/KILL. That's correct. The goal was to have a 2nd list that is a subset of the 1st list, and also not include execve() in the 2nd list. At this point though, since it's late in the release, we've expanded the 2nd list to be the same as the 1st with the exception of execve() not being in the 2nd list. The problem occurs when the filters are executed in the kernel when a syscall is executed. On each syscall the first filter will be executed and the action will either be ALLOW or TRAP/KILL, next the second filter will be executed and the action will either be ALLOW or TRAP/KILL; since the kernel always takes the most restrictive (lowest integer action value) action when multiple filters are specified, I think your double whitelist value is going to have some inherent problems. That's something I hadn't thought of. But TRAP and KILL won't exist together in our whitelists, and our 2nd whitelist is a subset of the 1st. So do you think there would still be problems? It doesn't really matter if the default action is TRAP and/or KILL, the point is that if you use a second whitelist after an initial whitelist the effective seccomp filter is going to be only the syscalls you explicitly allowed in the second whitelist. When using multiple seccomp filters on a process, all filters are executed for each syscall and the most restrictive action of all the filters is the action that the kernel takes. Don't get me wrong, I like the idea of progressively restricting QEMU, but if you are going to load multiple seccomp filters into the kernel, you almost certainly only want the first whitelist filter to be the union of all the seccomp filter you intend to load with all subsequent filters being blacklists which progressively remove syscalls which are allowed by the initial whitelist. That's what we're doing though. The first whitelist is a union of all subsequent filters. Of course there's only one subsequent filter at this point. But the idea is to start out with a large whitelist for initialization and then tighten it up before the main loop when presumably less syscalls are needed. Okay, that's good ... It still seems a bit odd to me, I think a whitelist 1st blacklist 2nd is a more intuitive and efficient solution but that may just be me. My concern is getting the two whitelists correct. We keep uncovering new syscalls as we test. Of course, this whole whitelist/blacklist discussion assumes the list of allowed syscalls is correct. I might suggest an initial, fairly permissive whitelist followed by a follow-on blacklist if you want to disable certain syscalls. I have to admit I'm nervous about this at this point in QEMU 1.3. It's getting late in the cycle and we'd hoped to get this in earlier. A more permissive whitelist is probably going to be the only way we'll successfully turn -sandbox on by default at this point in QEMU 1.3. Thats fine, I just wanted to point out that I think the multiple whitelist approach is going to have some inherent problems. Are you thinking there will be problems with the current two-whitelist approach, or are you thinking there would be problems in the future if we continued restricting the QEMU process with further whitelists? If you mean the latter, then I understand your point since QEMU is a single process that requires a certain subset of syscalls. I was originally concerned that you were structuring the whitelists incorrectly, but it sounds like that is not the case - that's good. I'm still concerned that the double whitelist approach may result in bigger syscall filters than necessary but until we get a final-ish list there is no point worrying
Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
On Friday, November 02, 2012 09:48:55 AM Corey Bryant wrote: On 11/01/2012 05:43 PM, Paul Moore wrote: On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote: According to the bug 855162[0] - there's the need of adding new syscalls to the whitelist whenn using Qemu with Libvirt. [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162 v2: Adding new syscalls to the list: readlink, rt_sigpending, and rt_sigtimedwait Reported-by: Paul Moore pmo...@redhat.com Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- qemu-seccomp.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) I had an opportunity to test this patchset on a F17 machine using QEMU 1.2 and unfortunately it still fails. I'm using a relatively basic guest configuration running F16, the details are documented in the RH BZ that Eduardo mentioned in the patch description. Paul, Here's the latest diff for the whitelist. We're looking to get the patches out in the next few days after a bit more testing. Okay, thanks for the updated list ... I'm rebuilding QEMU right now and I'll report back with the results later today. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
On Friday, November 02, 2012 12:29:37 AM Eduardo Otubo wrote: On Thu, Nov 01, 2012 at 05:43:03PM -0400, Paul Moore wrote: On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote: According to the bug 855162[0] - there's the need of adding new syscalls to the whitelist whenn using Qemu with Libvirt. [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162 v2: Adding new syscalls to the list: readlink, rt_sigpending, and rt_sigtimedwait Reported-by: Paul Moore pmo...@redhat.com Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- qemu-seccomp.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) I had an opportunity to test this patchset on a F17 machine using QEMU 1.2 and unfortunately it still fails. I'm using a relatively basic guest configuration running F16, the details are documented in the RH BZ that Eduardo mentioned in the patch description. Eduardo, I assume you are not able to reproduce this? Unfortunately no. But we have the v3 patchset coming soon with new syscalls and we're hoping to get this fixed. Thanks for the feedback Paul! No problem, thanks for all your work on this patchset. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
On Friday, November 02, 2012 10:10:02 AM Paul Moore wrote: On Friday, November 02, 2012 09:48:55 AM Corey Bryant wrote: On 11/01/2012 05:43 PM, Paul Moore wrote: On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote: According to the bug 855162[0] - there's the need of adding new syscalls to the whitelist whenn using Qemu with Libvirt. [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162 v2: Adding new syscalls to the list: readlink, rt_sigpending, and rt_sigtimedwait Reported-by: Paul Moore pmo...@redhat.com Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- qemu-seccomp.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) I had an opportunity to test this patchset on a F17 machine using QEMU 1.2 and unfortunately it still fails. I'm using a relatively basic guest configuration running F16, the details are documented in the RH BZ that Eduardo mentioned in the patch description. Paul, Here's the latest diff for the whitelist. We're looking to get the patches out in the next few days after a bit more testing. Okay, thanks for the updated list ... I'm rebuilding QEMU right now and I'll report back with the results later today. Sadly, no luck, it still fails. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
On Friday, November 02, 2012 10:43:41 AM Corey Bryant wrote: On 11/02/2012 10:38 AM, Paul Moore wrote: On Friday, November 02, 2012 10:10:02 AM Paul Moore wrote: On Friday, November 02, 2012 09:48:55 AM Corey Bryant wrote: On 11/01/2012 05:43 PM, Paul Moore wrote: On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote: According to the bug 855162[0] - there's the need of adding new syscalls to the whitelist whenn using Qemu with Libvirt. [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162 v2: Adding new syscalls to the list: readlink, rt_sigpending, and rt_sigtimedwait Reported-by: Paul Moore pmo...@redhat.com Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- qemu-seccomp.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) I had an opportunity to test this patchset on a F17 machine using QEMU 1.2 and unfortunately it still fails. I'm using a relatively basic guest configuration running F16, the details are documented in the RH BZ that Eduardo mentioned in the patch description. Paul, Here's the latest diff for the whitelist. We're looking to get the patches out in the next few days after a bit more testing. Okay, thanks for the updated list ... I'm rebuilding QEMU right now and I'll report back with the results later today. Sadly, no luck, it still fails. Hmm, let me send you the current patch set off-line, which includes debug support to write the failing syscall out. If you don't mind could you try it out? Sure, no problem. On a related note, I think it would be a *really* good idea to also submit the debug code upstream, just in a disabled state by default. You could either bracket it with #ifdefs or get fancy and allow it at runtime with '-sandbox debug' or something similar. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 3/4] Support for double whitelist filters
On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote: This patch includes a second whitelist right before the main loop. It's a smaller and more restricted whitelist, excluding execve() among many others. v2: * ctx changed to main_loop_ctx * seccomp_on now inside ifdef * open syscall added to the main_loop whitelist Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com Unfortunately qemu.org seems to be down for me today so I can't grab the latest repo to review/verify this patch (some of my comments/assumptions below may be off) but I'm a little confused, hopefully you guys can help me out, read below ... The first call to seccomp_install_filter() will setup a whitelist for the syscalls that have been explicitly specified, all others will hit the default action TRAP/KILL. The second call to seccomp_install_filter() will add a second whitelist for another set of explicitly specified syscalls, all others will hit the default action TRAP/KILL. The problem occurs when the filters are executed in the kernel when a syscall is executed. On each syscall the first filter will be executed and the action will either be ALLOW or TRAP/KILL, next the second filter will be executed and the action will either be ALLOW or TRAP/KILL; since the kernel always takes the most restrictive (lowest integer action value) action when multiple filters are specified, I think your double whitelist value is going to have some inherent problems. I might suggest an initial, fairly permissive whitelist followed by a follow-on blacklist if you want to disable certain syscalls. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 3/4] Support for double whitelist filters
On Friday, November 02, 2012 06:00:29 PM Corey Bryant wrote: On 11/02/2012 05:29 PM, Paul Moore wrote: On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote: This patch includes a second whitelist right before the main loop. It's a smaller and more restricted whitelist, excluding execve() among many others. v2: * ctx changed to main_loop_ctx * seccomp_on now inside ifdef * open syscall added to the main_loop whitelist Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com Unfortunately qemu.org seems to be down for me today so I can't grab the latest repo to review/verify this patch (some of my comments/assumptions below may be off) but I'm a little confused, hopefully you guys can help me out, read below ... The first call to seccomp_install_filter() will setup a whitelist for the syscalls that have been explicitly specified, all others will hit the default action TRAP/KILL. The second call to seccomp_install_filter() will add a second whitelist for another set of explicitly specified syscalls, all others will hit the default action TRAP/KILL. That's correct. The goal was to have a 2nd list that is a subset of the 1st list, and also not include execve() in the 2nd list. At this point though, since it's late in the release, we've expanded the 2nd list to be the same as the 1st with the exception of execve() not being in the 2nd list. The problem occurs when the filters are executed in the kernel when a syscall is executed. On each syscall the first filter will be executed and the action will either be ALLOW or TRAP/KILL, next the second filter will be executed and the action will either be ALLOW or TRAP/KILL; since the kernel always takes the most restrictive (lowest integer action value) action when multiple filters are specified, I think your double whitelist value is going to have some inherent problems. That's something I hadn't thought of. But TRAP and KILL won't exist together in our whitelists, and our 2nd whitelist is a subset of the 1st. So do you think there would still be problems? It doesn't really matter if the default action is TRAP and/or KILL, the point is that if you use a second whitelist after an initial whitelist the effective seccomp filter is going to be only the syscalls you explicitly allowed in the second whitelist. When using multiple seccomp filters on a process, all filters are executed for each syscall and the most restrictive action of all the filters is the action that the kernel takes. Don't get me wrong, I like the idea of progressively restricting QEMU, but if you are going to load multiple seccomp filters into the kernel, you almost certainly only want the first whitelist filter to be the union of all the seccomp filter you intend to load with all subsequent filters being blacklists which progressively remove syscalls which are allowed by the initial whitelist. I might suggest an initial, fairly permissive whitelist followed by a follow-on blacklist if you want to disable certain syscalls. I have to admit I'm nervous about this at this point in QEMU 1.3. It's getting late in the cycle and we'd hoped to get this in earlier. A more permissive whitelist is probably going to be the only way we'll successfully turn -sandbox on by default at this point in QEMU 1.3. Thats fine, I just wanted to point out that I think the multiple whitelist approach is going to have some inherent problems. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote: According to the bug 855162[0] - there's the need of adding new syscalls to the whitelist whenn using Qemu with Libvirt. [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162 v2: Adding new syscalls to the list: readlink, rt_sigpending, and rt_sigtimedwait Reported-by: Paul Moore pmo...@redhat.com Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- qemu-seccomp.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) I had an opportunity to test this patchset on a F17 machine using QEMU 1.2 and unfortunately it still fails. I'm using a relatively basic guest configuration running F16, the details are documented in the RH BZ that Eduardo mentioned in the patch description. Eduardo, I assume you are not able to reproduce this? diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 64329a3..a7b33e2 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -45,6 +45,13 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(access), 245 }, { SCMP_SYS(prctl), 245 }, { SCMP_SYS(signalfd), 245 }, +{ SCMP_SYS(getrlimit), 245 }, +{ SCMP_SYS(set_tid_address), 245 }, +{ SCMP_SYS(socketpair), 245 }, +{ SCMP_SYS(statfs), 245 }, +{ SCMP_SYS(unlink), 245 }, +{ SCMP_SYS(wait4), 245 }, +{ SCMP_SYS(getuid), 245 }, #if defined(__i386__) { SCMP_SYS(fcntl64), 245 }, { SCMP_SYS(fstat64), 245 }, @@ -107,7 +114,11 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(getsockname), 242 }, { SCMP_SYS(getpeername), 242 }, { SCMP_SYS(fdatasync), 242 }, -{ SCMP_SYS(close), 242 } +{ SCMP_SYS(close), 242 }, +{ SCMP_SYS(accept4), 242 }, +{ SCMP_SYS(readlink), 242 }, +{ SCMP_SYS(rt_sigpending), 242 }, +{ SCMP_SYS(rt_sigtimedwait), 242 } }; int seccomp_start(void) -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [libseccomp-discuss] [RFC] [PATCHv2 0/2] Sandboxing Qemu guests with Libseccomp
On Monday, October 29, 2012 03:32:34 PM Daniel P. Berrange wrote: On Mon, Oct 29, 2012 at 11:11:15AM -0400, Corey Bryant wrote: Hi Paul, Do you know when Fedora packages will be available for libseccomp? They're already reviewed and in Fedora 18 https://bugzilla.redhat.com/show_bug.cgi?id=830992 What Daniel said. If you're asking about F17, I don't plan on adding libseccomp to releases prior to F18. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] New syscalls to the seccomp whitelist
On Thursday, September 20, 2012 06:00:59 PM Eduardo Otubo wrote: Seccomp syscall whitelist updated after tests running qemu under libvirt ... Hi Eduardo, I know from our discussions offlist that you have an additional debugging patch to help identify missing syscalls, perhaps you could also submit that patch too? I think we would want the debugging patch #ifdef'd out in normal use, but I think it might help the QEMU developers. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] New syscalls to the seccomp whitelist
On Wednesday, September 26, 2012 01:24:54 PM Eduardo Otubo wrote: On Wed, Sep 26, 2012 at 11:14:29AM -0400, Paul Moore wrote: On Thursday, September 20, 2012 06:00:59 PM Eduardo Otubo wrote: Seccomp syscall whitelist updated after tests running qemu under libvirt ... Hi Eduardo, I know from our discussions offlist that you have an additional debugging patch to help identify missing syscalls, perhaps you could also submit that patch too? I think we would want the debugging patch #ifdef'd out in normal use, but I think it might help the QEMU developers. That's surely a good thing Paul. I'll rebase my patches and send it right away. Thanks! Great, I think that will be a nice addition. Also, FWIW, I don't think you need to rebase/resubmit the patch you already sent, just port your debugging patch to go on top. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH] New syscalls to the seccomp whitelist
On Friday, September 21, 2012 09:40:33 AM Eduardo Otubo wrote: Seccomp syscall whitelist updated after tests running qemu under libvirt. Reference to the bug - https://bugzilla.redhat.com/show_bug.cgi?id=855162 Unfortunately, this patch still does not work correctly for me using F17/libvirt and a F16 guest. The QEMU instance starts, and the guest appears to be running, but I never see any of the BIOS POST messages on the console. I'm attaching the guest's XML definition file if that helps. Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com --- qemu-seccomp.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 64329a3..4712338 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -70,6 +70,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(setsockopt), 245}, { SCMP_SYS(uname), 245}, { SCMP_SYS(semget), 245}, +{ SCMP_SYS(accept4), 241 }, #endif { SCMP_SYS(eventfd2), 245 }, { SCMP_SYS(dup), 245 }, @@ -107,7 +108,25 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(getsockname), 242 }, { SCMP_SYS(getpeername), 242 }, { SCMP_SYS(fdatasync), 242 }, -{ SCMP_SYS(close), 242 } +{ SCMP_SYS(close), 242 }, +{ SCMP_SYS(unlink), 241 }, +{ SCMP_SYS(statfs), 241 }, +{ SCMP_SYS(getuid), 241 }, +{ SCMP_SYS(ftruncate), 241 }, +{ SCMP_SYS(getegid), 241 }, +{ SCMP_SYS(geteuid), 241 }, +{ SCMP_SYS(getgid), 241 }, +{ SCMP_SYS(getrlimit), 241 }, +{ SCMP_SYS(set_tid_address), 241 }, +{ SCMP_SYS(socketpair), 241 }, +{ SCMP_SYS(fstatfs), 241 }, +{ SCMP_SYS(epoll_create), 241 }, +{ SCMP_SYS(epoll_ctl), 241 }, +{ SCMP_SYS(epoll_wait), 241 }, +{ SCMP_SYS(pipe), 241 }, +{ SCMP_SYS(poll), 241 }, +{ SCMP_SYS(rt_sigpending), 241 }, +{ SCMP_SYS(rt_sigtimedwait), 241 }, }; int seccomp_start(void) -- paul moore security and virtualization @ redhat f16-test-1.xml Description: XML document
Re: [Qemu-devel] [PATCH] fips: fix build on !Linux
On Friday, August 03, 2012 06:31:38 PM Anthony Liguori wrote: Commit 0f66998 makes -enable-fips conditional on Linux hosts but then uses it unconditionally in vl.c. Fix this by moving the fips handling to os-posix.c and adding a condition. Sorry for not catching this, thanks for the fix. Cc: Paul Moore pmo...@redhat.com Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- os-posix.c |5 + vl.c |3 --- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/os-posix.c b/os-posix.c index daf3d6f..79fa228 100644 --- a/os-posix.c +++ b/os-posix.c @@ -188,6 +188,11 @@ void os_parse_cmd_args(int index, const char *optarg) case QEMU_OPTION_daemonize: daemonize = 1; break; +#if defined(CONFIG_LINUX) +case QEMU_OPTION_enablefips: +fips_set_state(true); +break; +#endif } return; } diff --git a/vl.c b/vl.c index 8cda85f..6d2ce45 100644 --- a/vl.c +++ b/vl.c @@ -3199,9 +3199,6 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_qtest_log: qtest_log = optarg; break; -case QEMU_OPTION_enablefips: -fips_set_state(true); -break; default: os_parse_cmd_args(popt-index, optarg); } -- paul moore security and virtualization @ redhat
[Qemu-devel] [PATCH v5] vnc: disable VNC password authentication (security type 2) when in FIPS mode
FIPS 140-2 requires disabling certain ciphers, including DES, which is used by VNC to obscure passwords when they are sent over the network. The solution for FIPS users is to disable the use of VNC password auth when the host system is operating in FIPS compliance mode and the user has specified '-enable-fips' on the QEMU command line. This patch causes QEMU to emit a message to stderr when the host system is running in FIPS mode and a VNC password was specified on the commend line. If the system is not running in FIPS mode, or is running in FIPS mode but VNC password authentication was not requested, QEMU operates normally. Signed-off-by: Paul Moore pmo...@redhat.com -- Changelog * v5 - Added the '-enable-fips' command line option * v4 - Removed the use of syslog * v3 - Use fgetc() instead of fgets() in fips_enabled - Only emit a syslog message if the caller tries to use VNC password auth - Suggest alternative auth methods in the stderr notice * v2 - Protected syslog with _WIN32 - Protected the guts of fips_enabled() with __linux__ - Converted fips_enabled() and the fips flag from int to bool *v1 - Initial draft --- osdep.c | 29 + osdep.h |4 qemu-doc.texi |8 +--- qemu-options.hx | 11 +++ ui/vnc.c| 10 ++ vl.c|4 6 files changed, 63 insertions(+), 3 deletions(-) diff --git a/osdep.c b/osdep.c index 03817f0..c07faf5 100644 --- a/osdep.c +++ b/osdep.c @@ -24,6 +24,7 @@ #include stdlib.h #include stdio.h #include stdarg.h +#include stdbool.h #include string.h #include errno.h #include unistd.h @@ -48,6 +49,8 @@ extern int madvise(caddr_t, size_t, int); #include trace.h #include qemu_socket.h +static bool fips_enabled = false; + static const char *qemu_version = QEMU_VERSION; int socket_set_cork(int fd, int v) @@ -253,3 +256,29 @@ const char *qemu_get_version(void) { return qemu_version; } + +void fips_set_state(bool requested) +{ +#ifdef __linux__ +if (requested) { +FILE *fds = fopen(/proc/sys/crypto/fips_enabled, r); +if (fds != NULL) { +fips_enabled = (fgetc(fds) == '1'); +fclose(fds); +} +} +#else +fips_enabled = false; +#endif /* __linux__ */ + +#ifdef _FIPS_DEBUG +fprintf(stderr, FIPS mode %s (requested %s)\n, + (fips_enabled ? enabled : disabled), + (requested ? enabled : disabled)); +#endif +} + +bool fips_get_state(void) +{ +return fips_enabled; +} diff --git a/osdep.h b/osdep.h index 1e15a4b..d4b887d 100644 --- a/osdep.h +++ b/osdep.h @@ -3,6 +3,7 @@ #include stdarg.h #include stddef.h +#include stdbool.h #ifdef __OpenBSD__ #include sys/types.h #include sys/signal.h @@ -154,4 +155,7 @@ void qemu_set_cloexec(int fd); void qemu_set_version(const char *); const char *qemu_get_version(void); +void fips_set_state(bool requested); +bool fips_get_state(void); + #endif diff --git a/qemu-doc.texi b/qemu-doc.texi index 84dad19..f482fed 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -1124,9 +1124,11 @@ the protocol limits passwords to 8 characters it should not be considered to provide high security. The password can be fairly easily brute-forced by a client making repeat connections. For this reason, a VNC server using password authentication should be restricted to only listen on the loopback interface -or UNIX domain sockets. Password authentication is requested with the @code{password} -option, and then once QEMU is running the password is set with the monitor. Until -the monitor is used to set the password all clients will be rejected. +or UNIX domain sockets. Password authentication is not supported when operating +in FIPS 140-2 compliance mode as it requires the use of the DES cipher. Password +authentication is requested with the @code{password} option, and then once QEMU +is running the password is set with the monitor. Until the monitor is used to +set the password all clients will be rejected. @example qemu-system-i386 [...OPTIONS...] -vnc :1,password -monitor stdio diff --git a/qemu-options.hx b/qemu-options.hx index dc68e15..1f114ad 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2783,6 +2783,17 @@ DEF(qtest-log, HAS_ARG, QEMU_OPTION_qtest_log, -qtest-log LOG specify tracing options\n, QEMU_ARCH_ALL) +#ifdef __linux__ +DEF(enable-fips, 0, QEMU_OPTION_enablefips, +-enable-fipsenable FIPS 140-2 compliance\n, +QEMU_ARCH_ALL) +#endif +STEXI +@item -enable-fips +@findex -enable-fips +Enable FIPS 140-2 compliance mode. +ETEXI + HXCOMM This is the last statement. Insert new options before this line! STEXI @end table diff --git a/ui/vnc.c b/ui/vnc.c index cfc61a7..312ad7f 100644 --- a/ui/vnc.c +++ b/ui/vnc.c @@ -32,6 +32,7 @@ #include acl.h #include qemu-objects.h #include qmp-commands.h +#include osdep.h #define VNC_REFRESH_INTERVAL_BASE 30 #define VNC_REFRESH_INTERVAL_INC 50 @@ -2875,6 +2876,15 @@ int
Re: [Qemu-devel] [PATCH v4] vnc: disable VNC password authentication (security type 2) when in FIPS mode
On Friday, June 08, 2012 05:38:12 PM Paul Moore wrote: FIPS 140-2 requires disabling certain ciphers, including DES, which is used by VNC to obscure passwords when they are sent over the network. The solution for FIPS users is to disable the use of VNC password auth when the host system is operating in FIPS mode. This patch causes QEMU to emit a message to stderr when the host system is running in FIPS mode and a VNC password was specified on the commend line. If the system is not running in FIPS mode, or is running in FIPS mode but VNC password authentication was not requested, QEMU operates normally. Signed-off-by: Paul Moore pmo...@redhat.com Hi Anthony, Any word on this patch? Other than Daniel Berrange's reviewed-by tag, the discussion of the v4 patch has been quiet and I think we addressed all the other remaining issues in the discussion attached to the v2 patch posting. -Paul -- Changelog * v4 - Removed the use of syslog * v3 - Use fgetc() instead of fgets() in fips_enabled - Only emit a syslog message if the caller tries to use VNC password auth - Suggest alternative auth methods in the stderr notice * v2 - Protected syslog with _WIN32 - Protected the guts of fips_enabled() with __linux__ - Converted fips_enabled() and the fips flag from int to bool *v1 - Initial draft --- qemu-doc.texi |8 +--- ui/vnc.c | 27 +++ ui/vnc.h |1 + 3 files changed, 33 insertions(+), 3 deletions(-) diff --git a/qemu-doc.texi b/qemu-doc.texi index 0af0ff4..fe8d3df 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -1124,9 +1124,11 @@ the protocol limits passwords to 8 characters it should not be considered to provide high security. The password can be fairly easily brute-forced by a client making repeat connections. For this reason, a VNC server using password authentication should be restricted to only listen on the loopback interface -or UNIX domain sockets. Password authentication is requested with the @code{password} -option, and then once QEMU is running the password is set with the monitor. Until -the monitor is used to set the password all clients will be rejected. +or UNIX domain sockets. Password authentication is not supported when operating +in FIPS 140-2 compliance mode as it requires the use of the DES cipher. Password +authentication is requested with the @code{password} option, and then once QEMU +is running the password is set with the monitor. Until the monitor is used to +set the password all clients will be rejected. @example qemu-system-i386 [...OPTIONS...] -vnc :1,password -monitor stdio diff --git a/ui/vnc.c b/ui/vnc.c index 54bc5ad..4bd816d 100644 --- a/ui/vnc.c +++ b/ui/vnc.c @@ -48,6 +48,21 @@ static DisplayChangeListener *dcl; static int vnc_cursor_define(VncState *vs); static void vnc_release_modifiers(VncState *vs); +static bool fips_enabled(void) +{ +bool enabled = false; + +#ifdef __linux__ +FILE *fds = fopen(/proc/sys/crypto/fips_enabled, r); +if (fds != NULL) { +enabled = (fgetc(fds) == '1'); +fclose(fds); +} +#endif /* __linux__ */ + +return enabled; +} + static void vnc_set_share_mode(VncState *vs, VncShareMode mode) { #ifdef _VNC_DEBUG @@ -2748,6 +2763,9 @@ void vnc_display_init(DisplayState *ds) dcl-idle = 1; vnc_display = vs; +vs-fips = fips_enabled(); +VNC_DEBUG(FIPS mode %s\n, (vs-fips ? enabled : disabled)); + vs-lsock = -1; vs-ds = ds; @@ -2896,6 +2914,15 @@ int vnc_display_open(DisplayState *ds, const char *display) while ((options = strchr(options, ','))) { options++; if (strncmp(options, password, 8) == 0) { +if (vs-fips) { +fprintf(stderr, +VNC password auth disabled due to FIPS mode, +consider using the VeNCrypt or SASL authentication +methods as an alternative\n); +g_free(vs-display); +vs-display = NULL; +return -1; +} password = 1; /* Require password auth */ } else if (strncmp(options, reverse, 7) == 0) { reverse = 1; diff --git a/ui/vnc.h b/ui/vnc.h index a851ebd..d41631b 100644 --- a/ui/vnc.h +++ b/ui/vnc.h @@ -160,6 +160,7 @@ struct VncDisplay char *display; char *password; time_t expires; +bool fips; int auth; bool lossy; bool non_adaptive; -- paul moore security and virtualization @ redhat