On Thu, Jul 02, 2020 at 04:42:13PM -0400, Paul Moore wrote: > On Wed, Jul 1, 2020 at 5:32 PM Max Englander <max.englan...@gmail.com> wrote: > > > > In environments where the preservation of audit events and predictable > > usage of system memory are prioritized, admins may use a combination of > > --backlog_wait_time and -b options at the risk of degraded performance > > resulting from backlog waiting. In some cases, this risk may be > > preferred to lost events or unbounded memory usage. Ideally, this risk > > can be mitigated by making adjustments when backlog waiting is detected. > > > > However, detection can be diffult using the currently available metrics. > > For example, an admin attempting to debug degraded performance may > > falsely believe a full backlog indicates backlog waiting. It may turn > > out the backlog frequently fills up but drains quickly. > > > > To make it easier to reliably track degraded performance to backlog > > waiting, this patch makes the following changes: > > > > Add a new field backlog_wait_sum to the audit status reply. Initialize > > this field to zero. Add to this field the total time spent by the > > current task on scheduled timeouts while the backlog limit is exceeded. > > > > Tested on Ubuntu 18.04 using complementary changes to the audit > > userspace: https://github.com/linux-audit/audit-userspace/pull/134. > > > > Signed-off-by: Max Englander <max.englan...@gmail.com> > > --- > > Patch changelogs between v1 and v2: > > - Instead of printing a warning when backlog waiting occurs, add > > duration of backlog waiting to cumulative sum, and report this > > sum in audit status reply. > > > > include/uapi/linux/audit.h | 7 ++++++- > > kernel/audit.c | 9 +++++++++ > > 2 files changed, 15 insertions(+), 1 deletion(-) > > Hi Max, > > In general this looks better than the previous approach, but I do have > a few specific comments (inline). It also important that in addition
Thanks for your feedback and comments, Paul. I've prepared a v3 patch addressing all of your comments, with corresponding changes to the audit-userspace, which I'll submit once I get a working change to the test suite. I'll use this thread to respond to some things and ask a question. > to the requisite userspace patch, we also need a test added to the > audit-testsuite project so we can verify this functionality in the > future. > > * https://github.com/linux-audit/audit-testsuite > I downloaded this test suite and attempted to run it on Ubuntu 18.04, with the latest audit kernel tree and latest audit-userspace. Many tests were failing, I assume because I have some issues with my environment. May I ask what environment (OS, tree, commit) you recommend for running the test suite? Happy to move this question over to GitHub if that's a better venue. > > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h > > index a534d71e689a..ea0cc364beca 100644 > > --- a/include/uapi/linux/audit.h > > +++ b/include/uapi/linux/audit.h > > @@ -340,6 +340,7 @@ enum { > > #define AUDIT_STATUS_BACKLOG_LIMIT 0x0010 > > #define AUDIT_STATUS_BACKLOG_WAIT_TIME 0x0020 > > #define AUDIT_STATUS_LOST 0x0040 > > +#define AUDIT_STATUS_BACKLOG_WAIT_SUM 0x0080 > > Sooo ... you've defined this, but I don't see any of the corresponding > AUDIT_SET code that I would expect, was that an oversight? If not, it > is something we should support in the kernel as I'm sure admins will > want to reset this value at some point. To be honest I had based this patch off v1 which included a flag for setting the backlog warn time threshold, but didn't remove it from the v2 patch (an oversight). I wasn't thinking about admins' need to reset the value, but since you suggested it I've included support for that in the v3 patch. > > > #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT 0x00000001 > > #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002 > > @@ -348,6 +349,7 @@ enum { > > #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER 0x00000010 > > #define AUDIT_FEATURE_BITMAP_LOST_RESET 0x00000020 > > #define AUDIT_FEATURE_BITMAP_FILTER_FS 0x00000040 > > +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM 0x00000080 > > In an effort not to exhaust the feature bitmap too quickly, I've been > restricting it to only those features that would cause breakage with > userspace. I haven't looked closely at Steve's userspace in quite a > while, but I'm guessing it can key off the structure size and doesn't > need this entry in the bitmap, right? Let me rephrase, if userspace > needs to key off anything, it *should* key off the structure size and > not a new flag in the bitmask ;) I've removed this flag from the v3 patch, but if it's alright with you I'll key off the absence/presence of AUDIT_STATUS_BACKLOG_WAIT_TIME_ACTUAL, which follows the pattern currently used throughout the audit-userspace, and which Richard suggests in separate message. > > Also, I'm assuming that older userspace doesn't blow-up if it sees the > larger structure size? That's even more important. > > > #define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \ > > AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \ > > @@ -355,12 +357,14 @@ enum { > > AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \ > > AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \ > > AUDIT_FEATURE_BITMAP_LOST_RESET | \ > > - AUDIT_FEATURE_BITMAP_FILTER_FS) > > + AUDIT_FEATURE_BITMAP_FILTER_FS | \ > > + AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM) > > > > /* deprecated: AUDIT_VERSION_* */ > > #define AUDIT_VERSION_LATEST AUDIT_FEATURE_BITMAP_ALL > > #define AUDIT_VERSION_BACKLOG_LIMIT AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT > > #define AUDIT_VERSION_BACKLOG_WAIT_TIME > > AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME > > +#define AUDIT_VERSION_BACKLOG_WAIT_SUM > > AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM > > > > /* Failure-to-log actions */ > > #define AUDIT_FAIL_SILENT 0 > > @@ -466,6 +470,7 @@ struct audit_status { > > __u32 feature_bitmap; /* bitmap of kernel audit features > > */ > > }; > > __u32 backlog_wait_time;/* message queue wait timeout */ > > + __u32 backlog_wait_sum;/* time spent waiting while > > message limit exceeded */ > > This is very nitpicky, but how about a rename to 'backlog_wait_time_actual'? Sure, will do. > > > diff --git a/kernel/audit.c b/kernel/audit.c > > index 87f31bf1f0a0..301ea4f3d750 100644 > > --- a/kernel/audit.c > > +++ b/kernel/audit.c > > @@ -136,6 +136,11 @@ u32 audit_sig_sid = 0; > > */ > > static atomic_t audit_lost = ATOMIC_INIT(0); > > > > +/* Monotonically increasing sum of time the kernel has spent > > + * waiting while the backlog limit is exceeded. > > + */ > > +static atomic_t audit_backlog_wait_sum = ATOMIC_INIT(0); > > Needless to say, this should be renamed too so we don't go crazy. Of course :) > > > /* Hash for inode-based rules */ > > struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS]; > > > > @@ -1204,6 +1209,7 @@ static int audit_receive_msg(struct sk_buff *skb, > > struct nlmsghdr *nlh) > > s.backlog = skb_queue_len(&audit_queue); > > s.feature_bitmap = AUDIT_FEATURE_BITMAP_ALL; > > s.backlog_wait_time = audit_backlog_wait_time; > > + s.backlog_wait_sum = > > atomic_read(&audit_backlog_wait_sum); > > audit_send_reply(skb, seq, AUDIT_GET, 0, 0, &s, sizeof(s)); > > break; > > } > > @@ -1794,6 +1800,9 @@ struct audit_buffer *audit_log_start(struct > > audit_context *ctx, gfp_t gfp_mask, > > return NULL; > > } > > } > > + > > + if (stime != audit_backlog_wait_time) > > + atomic_add(audit_backlog_wait_time - stime, > > &audit_backlog_wait_sum); > > Since stime can only be different in one place in the code above > (after the schedule_timeout() call), why not move the atomic_add() up > there and drop the "if"? Yes there is the potential of calling > atomic_add() multiple times in this case, but the thread is waiting > anyway and this way we don't impact other code paths. My concern was calling atomic_add() more than necessary, but happy to move it up as you suggest. > > > } > > > > ab = audit_buffer_alloc(ctx, gfp_mask, type); > > -- > > 2.17.1 > > -- > paul moore > www.paul-moore.com -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit