On Tue, Jul 09, 2019 at 03:35:26PM -0600, Aaron Bieber wrote:
> I get the panic when the rule is in an anchor as well.
> 
> PGP: 0x1F81112D62A9ADCE / 3586 3350 BFEA C101 DB1A 4AF0 1F81 112D 62A9 ADCE
> 
> >> On Jul 7, 2019, at 4:39 PM, Mike Belopuhov <[email protected]> wrote:
> > ???
> > Aaron Bieber writes:
> > 
> >> Hi,
> >> 
> >> Adding a rule similar to the below causes a panic on -current (OpenBSD
> >> 6.5-current (GENERIC) #95: Thu Jul  4 21:22:25 MDT 2019). This also panics 
> >> 6.3
> >> and 6.5 (I didn't test 6.4):
> >> 
> >>  pass in quick on egress proto tcp from any to port 8888 once rdr-to \
> >>    127.0.0.1 port 3333
> >> 
> >> Once the rule is in place, fire up:
> >> 
> >>  nc -l 127.0.0.1 3333
> >> 
> >> Connect a few times from a remote machine:
> >> 
> >>  nc <ip> 8888
> >> 
> >> Eventually it will panic with (sometimes it happens right away, other 
> >> times I
> >> have to restart nc a few times):
> > 
> > This is because it's meant to be used inside of an anchor
> > (it removes the rule once it's matched).
> > 
> > The most sensible way to use it is to put it into the anchor
> > inside a recursive anchor (e.g. 'relayd/*').
> > 
> > It's possible that the check protecting the system from
> > the misuse like you've described here got lost during
> > refactoring or it never existed in the first place :-(
> > 
> > Cheers,
> > Mike
> > 
> >>  ddb> trace
> >>  pf_rm_rule(ffffffff81d900a8,ffff8000003bbfe8) at pf_rm_rule+0xa9
> >>  pf_purge_rule(ffff8000003bbfe8) at pf_purge_rule+0x26
> >>  pf_purge(ffffffff81dc1088) at pf_purge+0x55
> >>  taskq_thread(ffff800000022040) at taskq_thread+0x3d
> >>  end trace frame: 0x0, count: -4

[cc'ing sashan@ because this relates to pf.c r1.983]

I can replicate the crash as well.  The crash only happens if more than
one connection that matches the once rule is made in a short period of
time.  abieber@ triggered this when he ran "nc <ip> 8888" a few times.
After going through the code, I think I found the cause and wrote a
possible fix.

As of pf.c r1.983, once rules are removed by the purge thread instead of
immediately when a packet matches the rule.  The latest code is in
pf_test_rule():

        if (r->rule_flag & PFRULE_ONCE) {
                r->rule_flag |= PFRULE_EXPIRED;
                r->exptime = time_second;
                SLIST_INSERT_HEAD(&pf_rule_gcl, r, gcle);
        }

When a once rule is matched by a packet for the first time, the rule is
marked as expired and added to pf_rule_gcl.  Eventually the purge thread
is run, which calls pf_purge_expired_rules() to purge the expired rules
in pf_rule_gcl:

        while ((r = SLIST_FIRST(&pf_rule_gcl)) != NULL) {
                SLIST_REMOVE(&pf_rule_gcl, r, pf_rule, gcle);
                KASSERT(r->rule_flag & PFRULE_EXPIRED);
                pf_purge_rule(r);
        }

Since the purge thread only runs periodically, there is a short time
window where a once rule is expired but not yet purged.  If a second new
connection that matches the once rule is made in that time window, the
pf_test_rule() code block is executed again, causing the same once rule
to be added to pf_rule_gcl again.

As a result, when the purge thread finally runs, it tries to remove the
once rule multiple times, resulting in a crash, as seen on the
backtrace:

pf_rm_rule(ffffffff81d900a8,ffff8000003bbfe8) at pf_rm_rule+0xa9
pf_purge_rule(ffff8000003bbfe8) at pf_purge_rule+0x26
pf_purge(ffffffff81dc1088) at pf_purge+0x55
taskq_thread(ffff800000022040) at taskq_thread+0x3d

There is also a related bug where pf_test_rule() still creates state for
new connections that match the expired once rule in that time window,
even though the once rule has already expired.  In other words, the once
rule is not truly a one shot rule in that expired-but-not-purged time
window.

To fix both bugs, I wrote this diff that adds a check in pf_test_rule()
to prevent expired once rules from being added to pf_rule_gcl.  The
check is added "early" in pf_test_rule() to prevent any further
connections from creating state if they match the expired once rule.

I have sent this diff to abieber@ who confirmed that it fixes the crash
for him.

Thoughts? ok?


Index: pf.c
===================================================================
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1084
diff -u -p -r1.1084 pf.c
--- pf.c        9 Jul 2019 11:30:19 -0000       1.1084
+++ pf.c        12 Jul 2019 02:54:33 -0000
@@ -3862,6 +3862,13 @@ pf_test_rule(struct pf_pdesc *pd, struct
        if (r->action == PF_DROP)
                goto cleanup;
 
+       /*
+        * If an expired "once" rule has not been purged, drop any new matching
+        * packets to prevent the rule from being added to pf_rule_gcl again.
+        */
+       if (r->rule_flag & PFRULE_EXPIRED)
+               goto cleanup;
+
        pf_tag_packet(pd->m, ctx.tag, ctx.act.rtableid);
        if (ctx.act.rtableid >= 0 &&
            rtable_l2(ctx.act.rtableid) != pd->rdomain)

Reply via email to