Hello, I would like to present our plan for using audit briefly. We have made a prototype implementation, and discovered some things along the way.
We are making a middleware for ATC systems. We are writing it in Ada and partially in Python. In Python we do mostly the prototypes, so the prototype code is in Python. For that, we have one problem, to uniquely identify a process that communicated with the outside world. We have settled with the process start date. That date can be determined in a way so that it is stable (using /proc/stat btime field, elf note for Hertz value, and then translate ticks from /proc/pid/stat into a date) and reproducible outside of the process. Given the pid and start_date, we can check if a process is still alive, reliably. The method is notably different from what ps does, which may (or so I propose after looking at the source) output different start times in different runs. We have a daemon running that may or may not fork processes that it monitors, for the communicating ones, we want to be able to tell everybody in the system (spanning several nodes) that a communication partner is no more, for non-communicating ones we simply want to observe and report that e.g. ntpd or some monitoring/working shell script is running or not. The identifier hostname/pid/start_date is therefore what what we call a "life" of a process. It may restart, but the pid won't wrap around within one tick, that is at least the limiting restriction. Now one issue, I see is that the times that we get from auditd through the socket from its child daemon may not match the start_date exactly. I think they could. Actually we would prefer to receive the tick at which a process started, instead of a absolute time dated fork event, because then we could apply our code to calculate the stable time. Alternatively it would be nice to know how the time value from auditd comes into existance. In principle it's true, that for every event we should actually get the tick over a date, at least both. Ticks are the real kernel time, aren't they? Currently we feel we should apply a delta around the times to match them, and that's somehow unstable methinks. We would prefer delta to be 0. Otherwise we may e.g. run into pid number overruns much easier. The other thing is sequence numbers. We see in the output sequence numbers for each audit event. Very nice. But can you confirm where these sequence numbers are created? Are they done in the kernel, in auditd or in its child daemon? The underlying question is, how safe can we be that we didn't miss anything when sequence numbers don't suggest so. We would like to use the lossless mode of auditd. Does that simply mean that auditd may get behind in worst case? Then, we have first looked at auditd 1.2 (RHEL3), auditd 1.6 (RHEL5/Ubuntu) and auditd 1.7 (Debian and self-compiled for RHEL 5.2). The format did undergo important changes and it seems that 1.7 is much more friendly to parse. Can you confirm that a type=EOE delimits every event (is that even the correct term to use, audit trace, how is it called). We can't build the rpm due to dependency problems, so I was using the hard way, ./configure --prefix=/opt/auditd-1.7 and that works fine on our RHEL 5.2 it seems. What's not so clear to (me) is which kernel dependency there really is. Were there interface changes at all? The changelog didn't suggest so. BTW: Release-wise, will RHEL 5.3 include the latest auditd? That is our target platform for a release next year, and it sure would be nice not to have to fix up the audit installation. One thing I observed with 1.7.4-1 from Debian Testing amd64 that we won't ever see any clone events on the socket (and no forks, but we only know of cron doing these anyway), but all execs and exit_groups. The rules we use are: # First rule - delete all -D # Increase the buffers to survive stress events. # Make this bigger for busy systems -b 320 # Feel free to add below this line. See auditctl man page -a entry,always -S clone -S fork -S vfork -a entry,always -S execve -a entry,always -S exit_group -S exit Very strange. Works fine with self-compile RHEL 5.2, I understand that you are not Debian guys, I just wanted to ask you briefly if you were aware of anything that could cause that. I am going to report that as a bug (to them) otherwise. With our rules file, we have grouped only similar purpose syscalls that we care about. The goal we have is to track all newly created processes, their exits and the code they run. If you are aware of anything we miss, please point it out. Also, it is true (I read that yesterday) that every syscall is slowed down for every new rule? That means, we are making a mistake by not having only one line? And is open() performance really affected by this? Does audit not (yet?) use other tracing interface like SystemTap, etc. where people try to have 0 cost for inactive traces. Also on a general basis. Do you recommend using the sub-daemon for the job or should we rather use libaudit for the task instead? Any insight is welcome here. What we would like to achieve is: 1. Monitor every created process if it (was) relevant to something. We don't want to miss a process however briefly it ran. 2. We don't want to poll periodically, but rather only wake up (and then with minimal latency) when something interesting happened. We would want to poll a periodic check that forks are still reported, so we would detect a loss of service from audit. 3. We don't want to possible loose or miss anything, even if load gets higher, although we don't require to survive a fork bomb. Sorry for the overlong email. We just hope you can help us identify how to make best use of audit for our project. Best regards, Kay Hayen -- Linux-audit mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-audit
