Keeping track of called syscalls in real-time
Can the kernel keep track of all the system calls that were called by an application/module in real-time? I know I can statically use strace, or even gdb, but I am looking for a solution in real time when the application/module is already running and the user has no control over it. I am not sure if a system call needs to go through a sort of wrapper to get it from the syscall table, which I'm then assuming I can get such info from there, but I am not sure. I am looking for hints/options to archive this. Many thanks -- - seds ~> https://seds.nl ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Keeping track of called syscalls in real-time
On Wed, 28 Jun 2017 17:48:15 -0300, Ben Mezger said: > Can the kernel keep track of all the system calls that were called by an > application/module in real-time? > I know I can statically use strace, or even gdb, but I am looking for a > solution in real time when the application/module is already running and > the user has no control over it. What actual problem are you trying to solve by having the information? How "real-time" does it have to be? Have you looked at the syscall audit facility? pgpEKZUARYQv5.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel schedules kernel tasks on isolated cpus, SCHED_FIFO prevents kernel tasks from running
On Wed, 28 Jun 2017 14:02:37 -0500, Andrei Hurynovich said: > The question is why this old 2.6 kernel decide that it needs per-cpu > events and kblockd tasks. You have per-cpu events ecause your real-time process issues syscalls, and syscalls do things inside the kernel that require per-CPU infrastructure. You have kblockd and other per-cpu threads because you're using an old ancient kernel that doesn't have Frederic Weisbecker's CONFIG_NO_HZ_FULL support, or the support for a single system-wide kblockd, or a mess of other stuff. So it isn't that it "decides" to do it per-cpu, it's because 2.6.32 had it done that way for simplicity, and in the *8 years* since 2.6.32 was released, people cleaned much of that stuff up. [/usr/src/linux-next] git diff --shortstat v2.6.32 next-20170627 64638 files changed, 17302278 insertions(+), 5102365 deletions(-) Yes, you're 17 *million* lines of code behind the times. Much of what you are complaining about was solved half a decade or more ago. Or as Greg suggested, use a modern kernel. :) pgpSJBYFKLkhO.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel schedules kernel tasks on isolated cpus, SCHED_FIFO prevents kernel tasks from running
On Wed, 2017-06-28 at 08:39 -0500, Andrei Hurynovich wrote: > Hi. > > We are trying to build realtime(-ish) system based on rhel6(kernel > 2.6.32-642.1.1.el6.x86_64). > > We used isolcpus to remove some cpus from process > scheduling(isolcpus=2-19 nohz_full=2-19 rcu_nocbs=2-19). > > We spin off a program thread that set's its cpu affinity to one of > those > isolated cpus, sets its scheduling chass to SCHED_FIFO, spins in a > loop > and never sched_yield()-s to the kernel. > > We set sysctl kernel.sched_rt_runtime_us = -1 so realtime threads > are > NEVER interrupted. You want an actual realtime kernel for that to work right. The real time kernel currently supported by Red Hat is 3.10 based, not 2.6.32 based. -- All rights reversed signature.asc Description: This is a digitally signed message part ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel schedules kernel tasks on isolated cpus, SCHED_FIFO prevents kernel tasks from running
On Wed, Jun 28, 2017 at 08:39:07AM -0500, Andrei Hurynovich wrote: > Hi. > > We are trying to build realtime(-ish) system based on rhel6(kernel > 2.6.32-642.1.1.el6.x86_64). Wow, you do realize that is a _very_ old and obsolete kernel, supported by no one except Red Hat. If you stick with it, you are going to have to get your support from them, not the community, as you are already paying for it :) Good luck! greg k-h ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel schedules kernel tasks on isolated cpus, SCHED_FIFO prevents kernel tasks from running
On Wed, 28 Jun 2017 08:39:07 -0500, Andrei Hurynovich said: > We set sysctl kernel.sched_rt_runtime_us = -1 so realtime threads are > NEVER interrupted. > According to /proc/sched_debug, it seems that kernel still schedules > some SCHED_OTHER(e.g. non-realtime) kernel tasks to isolated cpus - for > example cpu 18 get tasks events/18 and kblockd/18 that are stuck in > runnable(but not running state), so those kernel processes never got a > single time slice because our realtime process hogs 100% of cpu. This is what happens when you have a priority inversion - when you tell the system to give 100% to a process, you shouldn't be surprised when other tasks don't get any service. > The question is: Is it possible to never schedule kernel tasks on > selected cpus? Only if the userspace process on that CPU never makes system calls - which is very unlikely if the process has actual real-time requirements. Also, if your "real-time" process is taking 100% of the CPU, you have a disaster waiting to happen. You have zero headroom for dealing with unexpected events. Thought experiment: What happens if your real-world part of the system has an unexpected error, that requires 1% of a CPU for error recovery? You are forced to either ignore the error or miss a real-time deadline. You might want to think about dividing up your process into 2 parts - one that handles the *actual* real-time work and only uses (for example) 20-30% of a CPU, and the parts that don't have actual real-time constraints that can then run with the rest of the available CPU, but allow other threads (such as kernel) to execute as well. pgpPaEp7fCi_P.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Kernel schedules kernel tasks on isolated cpus, SCHED_FIFO prevents kernel tasks from running
Thank you Valdis. Yes, I'm basically getting what I want - the RT proc never ever gives up to the system. There are a plenty of cores left to run non-rt tasks on the machine. The question is why this old 2.6 kernel decide that it needs per-cpu events and kblockd tasks. Maybe someone can give a hint in what subsystem's documentation can I find anything about workqueue tasks. /Documentation/kernel-per-CPU-kthreads.txt is great, but described controls appeared only in 3.10.x :) On 06/28/2017 01:04 PM, valdis.kletni...@vt.edu wrote: > On Wed, 28 Jun 2017 08:39:07 -0500, Andrei Hurynovich said: >> We set sysctl kernel.sched_rt_runtime_us = -1 so realtime threads are >> NEVER interrupted. >> According to /proc/sched_debug, it seems that kernel still schedules >> some SCHED_OTHER(e.g. non-realtime) kernel tasks to isolated cpus - for >> example cpu 18 get tasks events/18 and kblockd/18 that are stuck in >> runnable(but not running state), so those kernel processes never got a >> single time slice because our realtime process hogs 100% of cpu. > This is what happens when you have a priority inversion - when you tell the > system > to give 100% to a process, you shouldn't be surprised when other tasks don't > get any service. > >> The question is: Is it possible to never schedule kernel tasks on >> selected cpus? > Only if the userspace process on that CPU never makes system calls - which is > very unlikely if the process has actual real-time requirements. > > Also, if your "real-time" process is taking 100% of the CPU, you have a > disaster > waiting to happen. You have zero headroom for dealing with unexpected events. > Thought experiment: What happens if your real-world part of the system has > an unexpected error, that requires 1% of a CPU for error recovery? You are > forced to either ignore the error or miss a real-time deadline. > > You might want to think about dividing up your process into 2 parts - one that > handles the *actual* real-time work and only uses (for example) 20-30% of a > CPU, and the parts that don't have actual real-time constraints that can then > run with the rest of the available CPU, but allow other threads (such as > kernel) > to execute as well. > -- Thanks, Andrei Hurynovich Charlesworth Research LLC. http://www.charlesworthresearch.com/ ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Qemu+busybox for kernel development
Hi Alexander, On Wed, Jun 28, 2017 at 1:46 PM, Alexander Kapshuk < alexander.kaps...@gmail.com> wrote: > I am trying to setup a build environment where I can run the kernel and > see how the changes I have made to the kernel source work. > My understanding, based on googling, is that it is common practice in the > kernel community to use a virtualised environment for that purpose. > What I have done so far is create a ramfs that is built into the kernel, > as described here [1] and here [2]. > > [1] https://landley.net/writing/rootfs-howto.html > [2] https://git.kernel.org/pub/scm/linux/kernel/git/ > torvalds/linux.git/plain/Documentation/early-userspace/README?h=v4.12-rc7 > > a). I have generated a minimal initramfs_list file: > scripts/gen_initramfs_list.sh -d >usr/initramfs_list > Which looks like this: > # This is a very simple, default initramfs > > dir /dev 0755 0 0 > nod /dev/console 0600 0 0 c 5 1 > dir /root 0700 0 0 > # file /kinit usr/kinit/kinit 0755 0 0 > # slink /init kinit 0755 0 0 > slink /bin/sh busybox 777 0 0 > file /init /bin/busybox 755 0 0 > > b). Set CONFIG_INITRAMFS_SOURCE: > CONFIG_INITRAMFS_SOURCE="/home/sasha/linux/usr/initramfs_list" > > c). And had the kernel generate the initramfs image: > make > ... > GEN usr/initramfs_data.cpio.gz > CHK include/generated/compile.h > AS usr/initramfs_data.o > LD usr/built-in.o > ... > > When I run the kernel in qemu I get an error message which complains about > /etc/init.d/rcS missing. > Did you check initramfs for rc.S script? The kernel configuration, busybox configuration and the layout of initramfs (which should be /home/sasha/linux/usr/initramfs_list) should be consistent. If the kernel is asking for it then it is not consistent. > The posts online seem to suggest that this has got to do with the busybox > configuration. > So far, I have not been able to get my head around this problem. > Any points or suggestions would be much appreciated. > > Alexander Kapshuk. > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > Thank you. Shahbaz Khan ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Keeping track of called syscalls in real-time
On Wed, 28 Jun 2017 19:06:56 -0300, Ben Mezger said: > I'm actually formulating my thesis project. I am looking for a way to > intercept system calls (those chosen by the users), where I can keep > track of what syscall has been called and by who. As I said before - knowing this, what do you *do* with it? Statistics after the fact? Apply security rules before the fact? Something else? The answer depends *a lot* on what you're planning to *do* with the info. > A big picture of the _main_ idea of interception would be: Application > called a syscall -> Intercept and delay call -> do something before the > call -> return back to the syscall. "Do something before the syscall". Congrats - you just re-invented the LSM subsystem. Or possibly seccomp, depending on what it is you're trying to accomplish. Note that LSM's have some restrictions on what they can and can't do, mostly because it's otherwise almost impossible to do any reasoning about the security and stability guarantees of a process/system otherwise. > By real-time I mean as soon as an application called a syscall (i.e. > fopen), I could then receive a reply from the kernel informing me X > called fopen, where X could be a pid or whatever. Yes, but the question is "what value of "I then receive" appropriate? Do you need it before the syscall is executed? After it is finished? Or "don't care at all as long as we eventually get a complete trail"? > >> Have you looked at the syscall audit facility? > > I have not. Are you talking about auditctl? That's part of the userspace utilities that interface to the audit system. pgpUyfotOK83P.pgp Description: PGP signature ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Keeping track of called syscalls in real-time
> Whenever fopen("/etc/shadow", "r") is called, the tool would intercept > it, run the verify() procedure, and return back to the syscall, allowing > it to do it's job. This sounds like an LSM, possibly with a component which communicates with userspace, depending on how sophisticated "verify" needs to be. We've also done some very early work in trying to do this type of thing from a hypervisor. See: https://www.flyn.org/projects/VisorFlow/ -- Mike :wq ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Keeping track of called syscalls in real-time
I'm actually formulating my thesis project. I am looking for a way to intercept system calls (those chosen by the users), where I can keep track of what syscall has been called and by who. A big picture of the _main_ idea of interception would be: Application called a syscall -> Intercept and delay call -> do something before the call -> return back to the syscall. By real-time I mean as soon as an application called a syscall (i.e. fopen), I could then receive a reply from the kernel informing me X called fopen, where X could be a pid or whatever. >> Have you looked at the syscall audit facility? I have not. Are you talking about auditctl? On 06/28/2017 06:19 PM, valdis.kletni...@vt.edu wrote: > On Wed, 28 Jun 2017 17:48:15 -0300, Ben Mezger said: >> Can the kernel keep track of all the system calls that were called by an >> application/module in real-time? >> I know I can statically use strace, or even gdb, but I am looking for a >> solution in real time when the application/module is already running and >> the user has no control over it. > > What actual problem are you trying to solve by having the information? > > How "real-time" does it have to be? > > Have you looked at the syscall audit facility? > > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- - seds ~> https://seds.nl ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Keeping track of called syscalls in real-time
Let me clear things out. > As I said before - knowing this, what do you *do* with it? Statistics > after the fact? Apply security rules before the fact? Something else? > The answer depends *a lot* on what you're planning to *do* with the info. There is no statistics involved. I am trying to intercept *some* system calls. The list of syscalls I should intercept will be set by the user in a form of a rule, however, if a specified syscall is meant for file I/O (i.e. fopen), the user would need to specify which files he would like the interception to take part on whenever fopen is called. A simple example of a user rule would be (in a nutshell): # syscall #intercept #file_arg #action #onuid #oncall fopen 1/etc/shadow verify12 0 Where #syscall specifies which syscall to intercept, #intercept is a bool whenever it should run or not, the #file_arg basically says "intercept fopen only when fopen is called on /etc/shadow", #action specifies the name of the procedure the tool would run when intercepting fopen, #onuid specifies the user uid to intercept (run this just on the user who has 12 as uid) and finally, #oncall is a bool telling the tool to intercept after the syscall has returned (1 for after the call, 0 for before). Whenever fopen("/etc/shadow", "r") is called, the tool would intercept it, run the verify() procedure, and return back to the syscall, allowing it to do it's job. > Yes, but the question is "what value of "I then receive" appropriate? > Do you need it before the syscall is executed? After it is finished? > Or "don't care at all as long as we eventually get a complete trail"? That all depends on the config for *that* specific call. Using the previous examples, I would need the kernel to tell me right when fopen was called; int foo(...){ ... fopen(arg, "r"); <- need an alert from the kernel here } I am using the word "tool" here, but I am willing to get this builtin to the kernel when compiling it, so as a root user, it would be slightly more difficult to disable it without having to recompile everything (afaik). > Congrats - you just re-invented the LSM subsystem. Or possibly seccomp, > depending on what it is you're trying to accomplish. > > Note that LSM's have some restrictions on what they can and can't do, > mostly because it's otherwise almost impossible to do any reasoning about > the security and stability guarantees of a process/system otherwise. I understand seccomp and LSM allows __some__ type of syscall interposition (where afaik seccomp blocks mostly all of them), but what I am willing to do here is not *reinvent* the wheel, I am willing to make things a bit more configurable, where a user has access to an API where he could write custom procedures to run on the interception side, without having to dig through the source. Many thanks On 06/28/2017 07:26 PM, valdis.kletni...@vt.edu wrote: > On Wed, 28 Jun 2017 19:06:56 -0300, Ben Mezger said: >> I'm actually formulating my thesis project. I am looking for a way to >> intercept system calls (those chosen by the users), where I can keep >> track of what syscall has been called and by who. > > As I said before - knowing this, what do you *do* with it? Statistics > after the fact? Apply security rules before the fact? Something else? > The answer depends *a lot* on what you're planning to *do* with the info. > >> A big picture of the _main_ idea of interception would be: Application >> called a syscall -> Intercept and delay call -> do something before the >> call -> return back to the syscall. > > "Do something before the syscall". > > Congrats - you just re-invented the LSM subsystem. Or possibly seccomp, > depending on what it is you're trying to accomplish. > > Note that LSM's have some restrictions on what they can and can't do, > mostly because it's otherwise almost impossible to do any reasoning about > the security and stability guarantees of a process/system otherwise. > >> By real-time I mean as soon as an application called a syscall (i.e. >> fopen), I could then receive a reply from the kernel informing me X >> called fopen, where X could be a pid or whatever. > > Yes, but the question is "what value of "I then receive" appropriate? > Do you need it before the syscall is executed? After it is finished? > Or "don't care at all as long as we eventually get a complete trail"? > Have you looked at the syscall audit facility? >> >> I have not. Are you talking about auditctl? > > That's part of the userspace utilities that interface to the audit system. > -- - seds ~> https://seds.nl ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Kernel schedules kernel tasks on isolated cpus, SCHED_FIFO prevents kernel tasks from running
Hi. We are trying to build realtime(-ish) system based on rhel6(kernel 2.6.32-642.1.1.el6.x86_64). We used isolcpus to remove some cpus from process scheduling(isolcpus=2-19 nohz_full=2-19 rcu_nocbs=2-19). We spin off a program thread that set's its cpu affinity to one of those isolated cpus, sets its scheduling chass to SCHED_FIFO, spins in a loop and never sched_yield()-s to the kernel. We set sysctl kernel.sched_rt_runtime_us = -1 so realtime threads are NEVER interrupted. We are observing that the program thread is indeed realtime and is never interrupted. After some time working like this, the system becomes irresponsive - ssh connections start failing with timeout, existing connections hang when trying to read/write to physical disks(reading procfs or writing to tmpfs is unaffected). *** According to /proc/sched_debug, it seems that kernel still schedules some SCHED_OTHER(e.g. non-realtime) kernel tasks to isolated cpus - for example cpu 18 get tasks events/18 and kblockd/18 that are stuck in runnable(but not running state), so those kernel processes never got a single time slice because our realtime process hogs 100% of cpu. And kernel/18 and kblockd/18 never migrate to other cpus because these tasks are pinned to cpu 18. *** Check please these /proc/sched_debug snapshots, you can see that events/18 and kblockd/18 sum-exec counters are not increasing: https://gist.github.com/altmind/5cf4aad87a4a082441c1ca9378a06154 The question is: Is it possible to never schedule kernel tasks on selected cpus? -- Thanks, Andrei Hurynovich ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Re: Qemu+busybox for kernel development
The way I do it is by compiling the kernel as I would normaly do for a real system. Then, after copying vmlinuz and generating my initramfs, I run Qemu: $ qemu-system-x86_64 -kernel vmlinuz -initrd initramfs.img -append param1=value1 For me, as I am mostly testing, there is no need for a full-feature root FS, since initramfs is perfect for containing small test application and loadable kernel modules. This might come in hand, though: https://lwn.net/Articles/660404/ On 06/28/2017 05:46 AM, Alexander Kapshuk wrote: > I am trying to setup a build environment where I can run the kernel and > see how the changes I have made to the kernel source work. > My understanding, based on googling, is that it is common practice in > the kernel community to use a virtualised environment for that purpose. > What I have done so far is create a ramfs that is built into the kernel, > as described here [1] and here [2]. > > [1] https://landley.net/writing/rootfs-howto.html > [2] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/early-userspace/README?h=v4.12-rc7 > > a). I have generated a minimal initramfs_list file: > scripts/gen_initramfs_list.sh -d >usr/initramfs_list > Which looks like this: > # This is a very simple, default initramfs > > dir /dev 0755 0 0 > nod /dev/console 0600 0 0 c 5 1 > dir /root 0700 0 0 > # file /kinit usr/kinit/kinit 0755 0 0 > # slink /init kinit 0755 0 0 > slink /bin/sh busybox 777 0 0 > file /init /bin/busybox 755 0 0 > > b). Set CONFIG_INITRAMFS_SOURCE: > CONFIG_INITRAMFS_SOURCE="/home/sasha/linux/usr/initramfs_list" > > c). And had the kernel generate the initramfs image: > make > ... > GEN usr/initramfs_data.cpio.gz > CHK include/generated/compile.h > AS usr/initramfs_data.o > LD usr/built-in.o > ... > > When I run the kernel in qemu I get an error message which complains > about /etc/init.d/rcS missing. > The posts online seem to suggest that this has got to do with the > busybox configuration. > So far, I have not been able to get my head around this problem. > Any points or suggestions would be much appreciated. > > Alexander Kapshuk. > > > > ___ > Kernelnewbies mailing list > Kernelnewbies@kernelnewbies.org > https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- - seds ~> https://seds.nl ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
Qemu+busybox for kernel development
I am trying to setup a build environment where I can run the kernel and see how the changes I have made to the kernel source work. My understanding, based on googling, is that it is common practice in the kernel community to use a virtualised environment for that purpose. What I have done so far is create a ramfs that is built into the kernel, as described here [1] and here [2]. [1] https://landley.net/writing/rootfs-howto.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/early-userspace/README?h=v4.12-rc7 a). I have generated a minimal initramfs_list file: scripts/gen_initramfs_list.sh -d >usr/initramfs_list Which looks like this: # This is a very simple, default initramfs dir /dev 0755 0 0 nod /dev/console 0600 0 0 c 5 1 dir /root 0700 0 0 # file /kinit usr/kinit/kinit 0755 0 0 # slink /init kinit 0755 0 0 slink /bin/sh busybox 777 0 0 file /init /bin/busybox 755 0 0 b). Set CONFIG_INITRAMFS_SOURCE: CONFIG_INITRAMFS_SOURCE="/home/sasha/linux/usr/initramfs_list" c). And had the kernel generate the initramfs image: make ... GEN usr/initramfs_data.cpio.gz CHK include/generated/compile.h AS usr/initramfs_data.o LD usr/built-in.o ... When I run the kernel in qemu I get an error message which complains about /etc/init.d/rcS missing. The posts online seem to suggest that this has got to do with the busybox configuration. So far, I have not been able to get my head around this problem. Any points or suggestions would be much appreciated. Alexander Kapshuk. ___ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies