Re: vmd testers: serial console hangs fix

2023-10-16 Thread Alexandr Nedvedicky
Hello,

I've applied diff to 7.3. it works I can paste to serial console,
hitting tab, etc..  Everything seems to work as expected.

thanks and
regards
sashan

On Mon, Oct 09, 2023 at 10:51:05AM -0400, Dave Voutila wrote:
> Looking for folks that use the serial console connection in vmd(8) and
> experience the host CPU spikes & delays associated with things like
> hitting the up key (to cycle shell command history) or copy/paste in X.
> 
> vmd(8) approximates edge triggered interrupts with the emulated i8259
> interrupt controller. Some core devices (serial console, rtc, timer) try
> "toggling" an irq...but it's pointless in how our emulated i8259 is
> implemented. The deassert does nothing for edge triggered interrupts
> other than causing race conditions and extra syscalls.
> 
> I've tested on two of my machines and this resolves the issue I've been
> seeing off and on. It should fix the report[1] on bugs@ from months ago.
> 
> -dv
> 
> [1] https://marc.info/?l=openbsd-bugs=169669132915760=2
> 
> diff refs/heads/master refs/heads/vmd-edge
> commit - 7869b2fdaac7e118bfd1783874fe25ce3b8b0f09
> commit + 00d448cdbe2461f27419aaf79520a0cef720aefc
> blob - b98e7bdc69ac6a12eb84eaaf97ec43ecdbe83733
> blob + 248e3b161e88be505a88411156dc57a2772bd4fa
> --- usr.sbin/vmd/i8253.c
> +++ usr.sbin/vmd/i8253.c
> @@ -371,7 +371,6 @@ i8253_fire(int fd, short type, void *arg)
>   struct i8253_channel *ctr = (struct i8253_channel *)arg;
> 
>   vcpu_assert_pic_irq(ctr->vm_id, 0, 0);
> - vcpu_deassert_pic_irq(ctr->vm_id, 0, 0);
> 
>   if (ctr->mode != TIMER_INTTC) {
>   timerclear();
> blob - 43dce7b10d1467a5b7ac7f3308d01e32b4d0b9ee
> blob + 4fc147b19c99627e386ab26355b8f90a6ae5872b
> --- usr.sbin/vmd/mc146818.c
> +++ usr.sbin/vmd/mc146818.c
> @@ -150,7 +150,6 @@ rtc_fireper(int fd, short type, void *arg)
>   rtc.regs[MC_REGC] |= MC_REGC_PF;
> 
>   vcpu_assert_pic_irq((ptrdiff_t)arg, 0, 8);
> - vcpu_deassert_pic_irq((ptrdiff_t)arg, 0, 8);
> 
>   evtimer_add(, _tv);
>  }
> blob - bc23876bf0392312335da0d00e143583a87549af
> blob + 98ed7dbecf2120ccbf9a16198ee479cb15aebc5f
> --- usr.sbin/vmd/ns8250.c
> +++ usr.sbin/vmd/ns8250.c
> @@ -82,7 +82,6 @@ ratelimit(int fd, short type, void *arg)
>   com1_dev.regs.iir &= ~IIR_NOPEND;
> 
>   vcpu_assert_pic_irq(com1_dev.vmid, 0, com1_dev.irq);
> - vcpu_deassert_pic_irq(com1_dev.vmid, 0, com1_dev.irq);
>   mutex_unlock(_dev.mutex);
>  }
> 
> @@ -160,7 +159,6 @@ com_rcv_event(int fd, short kind, void *arg)
>   if ((com1_dev.regs.iir & IIR_NOPEND) == 0) {
>   /* XXX: vcpu_id */
>   vcpu_assert_pic_irq((uintptr_t)arg, 0, com1_dev.irq);
> - vcpu_deassert_pic_irq((uintptr_t)arg, 0, com1_dev.irq);
>   }
> 
>   mutex_unlock(_dev.mutex);
> 



Re: pf log drop default rule

2023-10-15 Thread Alexandr Nedvedicky
Hello,


> When I check my pflog files in WireShark, I note that WireShark displays
> this in the "Info" column:
> 
> [pass vio0/-1]
> 

yes, this can be default rule. snippet below comes from pfattach():

 239 /* default rule should never be garbage collected */
 240 pf_default_rule.entries.tqe_prev = 
_default_rule.entries.tqe_next;
 241 pf_default_rule.action = PF_PASS;
 242 pf_default_rule.nr = (u_int32_t)-1;
 243 pf_default_rule.rtableid = -1;

at line 242 the rule number for default rule is initialized to -1. 
the number never changes.

regards
sashan


On Fri, Oct 13, 2023 at 04:36:25PM -0400, J Doe wrote:
> On 2023-10-10 18:28, Alexander Bluhm wrote:
> 
> > Hi,
> > 
> > If a packet is malformed, it is dropped by pf(4).  The rule referenced
> > in pflog(4) is the default rule.  As the default rule is a pass
> > rule, tcpdump prints "pass" although the packet is actually dropped.
> > I have reports from genua and OPNsense users who are confused by
> > the output.
> > 
> > With the diff below we see pass or blocked when the packet is matched
> > or dropped due to bad fragment respectively.
> 
> Hello,
> 
> I have experienced something with pf that I think may be related to this,
> but I wasn't sure.
> 
> When I check my pflog files in WireShark, I note that WireShark displays
> this in the "Info" column:
> 
> [pass vio0/-1]
> 
> Does the "-1" for the rule number mean that this is the implicit/default
> rule ?
> 
> This is for a packet that is being processed by my default deny rule, which
> appears to be a malformed packet, but shows up in WireShark as "pass".
> 
> Thanks,
> 
> - J
> 



Re: pf log drop default rule

2023-10-10 Thread Alexandr Nedvedicky
Hello,

I'm fine with it.

OK sashan

On Wed, Oct 11, 2023 at 12:28:20AM +0200, Alexander Bluhm wrote:
> Hi,
> 
> If a packet is malformed, it is dropped by pf(4).  The rule referenced
> in pflog(4) is the default rule.  As the default rule is a pass
> rule, tcpdump prints "pass" although the packet is actually dropped.
> I have reports from genua and OPNsense users who are confused by
> the output.
> 
> With the diff below we see pass or blocked when the packet is matched
> or dropped due to bad fragment respectively.
> 
> 19:29:17.314991 rule def/(match) [uid 0, pid 0] pass in on em1: 10.188.81.21 
> > 10.188.81.22: (frag 43955:8@8+) (ttl 64, len 28)
> 19:29:31.321728 rule def/(fragment) [uid 0, pid 0] block in on em1: 
> 10.188.81.21 > 10.188.81.22: (frag 27096:64@4032+) (ttl 64, len 84)
> 
> ok?
> 
> bluhm
> 
> Index: net/if_pflog.c
> ===
> RCS file: /cvs/src/sys/net/if_pflog.c,v
> retrieving revision 1.97
> diff -u -p -r1.97 if_pflog.c
> --- net/if_pflog.c20 Jan 2021 23:25:19 -  1.97
> +++ net/if_pflog.c10 Oct 2023 17:20:00 -
> @@ -204,7 +204,9 @@ pflog_packet(struct pf_pdesc *pd, u_int8
>  
>   bzero(, sizeof(hdr));
>   hdr.length = PFLOG_REAL_HDRLEN;
> - hdr.action = rm->action;
> + /* Default rule does not pass packets dropped for other reasons. */
> + hdr.action = (rm->nr == (u_int32_t)-1 && reason != PFRES_MATCH) ?
> + PF_DROP : rm->action;
>   hdr.reason = reason;
>   memcpy(hdr.ifname, pd->kif->pfik_name, sizeof(hdr.ifname));
>  
> 



Re: pf passes packet if limit reached

2023-10-10 Thread Alexandr Nedvedicky
On Tue, Oct 10, 2023 at 02:53:15PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> The behaviour of the PFRULE_SRCTRACK and max_states check was
> unintentionally changed by this commit.
> 
> 
> revision 1.964
> date: 2016/01/25 18:49:57;  author: sashan;  state: Exp;  lines: +18 -10;  
> commitid: KeemoLxcm7FS1oYy;
> - plugging massive pf_state_key leak
> 
> OK mpi@ dlg@ sthen@
> 
> 
> If we do not create a state after some limit was reached, pf still
> passes the packet.  We can restore the old behavior by setting
> action later, after the checks.
> 
> ok?
> 

oh dear... thanks for finding that bug out and killing it

OK sashan



Re: pf_pull_hdr useless action pointer and fragment logic

2023-10-10 Thread Alexandr Nedvedicky
Hello,

On Mon, Oct 09, 2023 at 08:07:35PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> pf_pull_hdr() allows to pass an action pointer parameter as output
> value.  This is never used, all callers pass a NULL argument.  Remove
> ACTION_SET() entirely.
> 
> The logic if (fragoff >= len) in pf_pull_hdr() looks odd.  One is
> the offset in the IP packet, the latter the length of some header
> within the fragment.  In revision 1.1 the logic was used to drop
> short TCP or UDP fragments that contained only part of the header.
> This does not work since pf_pull_hdr() supports offsets.
> 
> 
> revision 1.4
> date: 2001/06/24 20:54:55;  author: itojun;  state: Exp;  lines: +18 -16;
> pull_hdr() now takes header offset explicitly, to help header chain parsing
> (v6, ipsec)
> 
> 
> The code drops the packets anyway, so always set reason PFRES_FRAG.
> 
> ok?
> 

yes, please. looks good to me.

OK sashan



Re: link mbufs/inpcbs to pf_states, not pf_state_keys

2023-08-21 Thread Alexandr Nedvedicky
Hello,

On Tue, Aug 22, 2023 at 12:21:59AM +0200, Alexander Bluhm wrote:
> On Thu, Aug 17, 2023 at 12:02:35PM +1000, David Gwynne wrote:
> > there are links between the pcb/socket layer and pf as an optimisation,
> > and links on mbufs between both sides of a forwarded connection.
> > these links let pf skip an rb tree lookup for outgoing packets.
> > 
> > right now these links are between pf_state_key structs, which are the
> > things that contain the actual addresses used by the connection, but you
> > then have to iterate over a list in pf_state_keys to get to the pf_state
> > structures.
> > 
> > i dont understand why we dont just link the actual pf_state structs.
> > my best guess is there wasnt enough machinery (ie, refcnts and
> > mtxes) on a pf_state struct to make it safe, so the compromise was
> > the pf_state keys. it still got to avoid the tree lookup.
> 
> This linkage is much older than MP in pf.  There was no refcount
> and mutex when added by henning@.
> 
> > i wanted this to make it easier to look up information on pf states from
> > the socket layer, but sashan@ said i should send it out. i do think it
> > makes things a bit easier to understand.
> > 
> > the most worrying bit is the change to pf_state_find().
> > 
> > thoughts? ok?
> 
> It took me years to get the logic correct for all corner cases.
> When using pf-divert with UDP strange things start to happen.  Most
> things are covered by regress/sys/net/pf_divert .  You have to setup
> two machines to run it.

I'm running the diff on my home router which is doing simple IPv4 NAT,
nothing fancy. I have not noticed any problem so far. Uptime is ~3 days.
I have not used pf_divert tests.

> 
> I don't know why it was written that way, but I know it works for
> me.  What do you want to fix?
> 
> 

I've seen the diff off-list as a part of another change. I did ask
David to post this part (the diff) to tech@ separately because I like it.
In my opinion it makes things bit more straightforward. Things
just look more pretty with this diff in. Currently we have
something like this:

{ mbuf, pcb } <-> state key <-> { state, state ... }

with this diff we get to:

{ mbuf, pcb } <-> state <-> state key

Basically when we do process packet we are interested in state
not state key itself. The state key is just a mean to reach the state,
so why not to reach/keep state directly. I like the simplicity
introduced by proposed diff.

I support this change. I also prefer to reach wide consent on this
before the diff will arrive to tree.

thanks and
regards
sashan



pf(4) may cause relayd(8) to abort

2023-07-31 Thread Alexandr Nedvedicky
Hello,

the issue has been reported by Gianni Kapetanakis month ago [1]. It took
several emails to figure out relayd(8) exists after hosts got disabled
by 'relayctl host dis ...'

The thing is that relayd(8) relies on pf(4) to create persistent
tables (PFR_TFLAG_PERSIST) as relayd requests that:

 47 void
 48 init_tables(struct relayd *env)
 49 {
 ...
 62 TAILQ_FOREACH(rdr, env->sc_rdrs, entry) {
 63 if (strlcpy(tables[i].pfrt_anchor, RELAYD_ANCHOR "/",
 64 sizeof(tables[i].pfrt_anchor)) >= PF_ANCHOR_NAME_SIZE)
 65 goto toolong;
 66 if (strlcat(tables[i].pfrt_anchor, rdr->conf.name,
 67 sizeof(tables[i].pfrt_anchor)) >= PF_ANCHOR_NAME_SIZE)
 68 goto toolong;
 69 if (strlcpy(tables[i].pfrt_name, rdr->conf.name,
 70 sizeof(tables[i].pfrt_name)) >=
 71 sizeof(tables[i].pfrt_name))
 72 goto toolong;
 73 tables[i].pfrt_flags |= PFR_TFLAG_PERSIST;
 74 i++;
 75 }

unfortunately it's not the case as further investigation revealed [2].

the issue can be easily reproduced by pfctl(8) which also creates
persistent tables on behalf of command line:

pfctl -t foo -T add ...

command above always asks pf(4) to create persistent table, however
pf(4) does not honor persistent flag when  table exists already.
One can verify that using commands as follows:

## create 'referenced' table only (table exists but has no active flag)
# echo 'pass from in  to any' |pfctl -f -
# pfctl -sT -vg
r-- foo
# create instance of table  using command line:
# pfctl -t foo -T add 192.168.1.0/24
1/1 addresses added.
# pfctl -sT -vg
--a-r-- foo
## create instance of table , note the table will get 'p' flag
# pfctl -t bar -T add 192.168.10.0/24
1 table created.
1/1 addresses added.
# pfctl -sT -vg
-pa bar
--a-r-- foo

one-liner change to sys/net/pf_table.c fixes that it also works for Gianni
Kapetanakis. I'm also adding tests to regress/sys/net/pf_table/Makefile
to cover it.

On system which runs current the test fails with error as follows:

pfctl -a regress/ttest -t instance -T add 192.168.1.0/24
1/1 addresses added.
pfctl -a regress/ttest -sT -vg | diff table-persist.out -
1c1
< -pa-r--   instanceregress/ttest
---
> --a-r--   instanceregress/ttest
*** Error 1 in . (Makefile:96 'flags')
FAILED

the failure is expected on system without patch. On system with
patch applied all tests do pass.

OK to commit?

thanks and
regards
sashan


[1] https://marc.info/?t=16881127045=1=2

[2] https://marc.info/?l=openbsd-bugs=168868165801905=2

8<---8<---8<--8<
diff --git a/sys/net/pf_table.c b/sys/net/pf_table.c
index 6f23a6f795d..c862c804f84 100644
--- a/sys/net/pf_table.c
+++ b/sys/net/pf_table.c
@@ -1565,8 +1565,10 @@ pfr_add_tables(struct pfr_table *tbl, int size, int 
*nadd, int flags)
xadd++;
} else if (!(flags & PFR_FLAG_DUMMY) &&
!(p->pfrkt_flags & PFR_TFLAG_ACTIVE)) {
-   p->pfrkt_nflags = (p->pfrkt_flags &
-   ~PFR_TFLAG_USRMASK) | PFR_TFLAG_ACTIVE;
+   p->pfrkt_nflags =
+   (p->pfrkt_flags & ~PFR_TFLAG_USRMASK) |
+   (n->pfrkt_flags & PFR_TFLAG_USRMASK) |
+   PFR_TFLAG_ACTIVE;
SLIST_INSERT_HEAD(, p, pfrkt_workq);
}
}
diff --git a/regress/sys/net/pf_table/Makefile 
b/regress/sys/net/pf_table/Makefile
index a71f0190c73..8911e8a1d35 100644
--- a/regress/sys/net/pf_table/Makefile
+++ b/regress/sys/net/pf_table/Makefile
@@ -1,15 +1,26 @@
 #  $OpenBSD: Makefile,v 1.3 2017/07/07 23:15:27 bluhm Exp $
 
-REGRESS_TARGETS=   hit miss cleanup
-CLEANFILES=stamp-*
+REGRESS_TARGETS=   hit miss cleanup flags
+CLEANFILES=stamp-* \
+   pf-reftab.conf  \
+   pf-instance.conf\
+   table-ref.conf  \
+   table-pgone.out \
+   table-persist.out   \
+   table-ref.out   \
+   table-refgone.out
+
 
 stamp-setup:
+   ${SUDO} pfctl -a regress/ttest -Fa
${SUDO} pfctl -qt __regress_tbl -T add -f ${.CURDIR}/table.in
date >$@
 
 cleanup:
rm -f stamp-setup
${SUDO} pfctl -qt __regress_tbl -T kill
+   ${SUDO} pfctl -q -a regress/ttest -Fr
+   ${SUDO} pfctl -q -a regress/ttest -qt instance -T kill
 
 hit: stamp-setup
for i in `cat ${.CURDIR}/table.hit`; do \
@@ -27,6 +38,77 @@ miss: stamp-setup
done; \
exit 0
 
-.PHONY: hit 

Re: pf(4) should mention DIOCXEND

2023-07-05 Thread Alexandr Nedvedicky
Hello,

diff below includes suggestions from jmc@ and kn@
I'll commit it later today.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/share/man/man4/pf.4 b/share/man/man4/pf.4
index 92eeb45f657..e9d4231fe97 100644
--- a/share/man/man4/pf.4
+++ b/share/man/man4/pf.4
@@ -48,12 +48,25 @@ and retrieve statistics.
 The most commonly used functions are covered by
 .Xr pfctl 8 .
 .Pp
-Manipulations like loading a ruleset that involve more than a single
+Operations loading or reading a ruleset that involve more than a single
 .Xr ioctl 2
 call require a so-called
-.Em ticket ,
-which prevents the occurrence of
-multiple concurrent manipulations.
+.Sy ticket ,
+which allows
+.Nm
+to deal with concurrent operations.
+For certain
+.Xr ioctl 2
+commands (currently
+.Dv DIOCGETRULES )
+the number of tickets a program can get is limited.
+The programs must explicitly release their tickets using the
+.Dv DIOCXEND
+command to avoid hitting the limit.
+All tickets which are not freed by
+.Dv DIOCXEND
+are released when the program closes
+.Pa /dev/pf .
 .Pp
 Fields of
 .Xr ioctl 2
@@ -132,6 +145,9 @@ for subsequent
 calls and the number
 .Va nr
 of rules in the active ruleset.
+The ticket should be released by the
+.Dv DIOCXEND
+command.
 .It Dv DIOCGETRULE Fa "struct pfioc_rule *pr"
 Get a
 .Va rule
@@ -792,6 +808,10 @@ inactive rulesets since the last
 .Dv DIOCXBEGIN .
 .Dv DIOCXROLLBACK
 will silently ignore rulesets for which the ticket is invalid.
+.It Dv DIOCXEND Fa "u_int32_t *ticket"
+Release the ticket obtained by the
+.Dv DIOCGETRULES
+command.
 .It Dv DIOCSETHOSTID Fa "u_int32_t *hostid"
 Set the host ID, which is used by
 .Xr pfsync 4



Re: pf(4) should mention DIOCXEND

2023-07-05 Thread Alexandr Nedvedicky
Hello,


On Wed, Jul 05, 2023 at 11:10:11AM +0200, Alexandr Nedvedicky wrote:

> 
> thanks for your help to put my update to pf(4) to shape.
> updated diff is below.
> 

diff in my earlier email was wrong. this one is the right one.

sorry for extra noise.

regards
sashan

8<---8<---8<--8<
diff --git a/share/man/man4/pf.4 b/share/man/man4/pf.4
index 92eeb45f657..7346c7e3194 100644
--- a/share/man/man4/pf.4
+++ b/share/man/man4/pf.4
@@ -48,12 +48,25 @@ and retrieve statistics.
 The most commonly used functions are covered by
 .Xr pfctl 8 .
 .Pp
-Manipulations like loading a ruleset that involve more than a single
+Operations loading or reading a ruleset that involve more than a single
 .Xr ioctl 2
 call require a so-called
-.Em ticket ,
-which prevents the occurrence of
-multiple concurrent manipulations.
+.Sy ticket ,
+which allows
+.Xr pf 4
+to deal with concurrent operations.
+For certain
+.Xr ioctl 2
+commands (currently
+.Dv DIOCGETRULES )
+the number of tickets program can get is limited.
+The program must explicitly release the ticket using the
+.Dv DIOCXEND
+command to avoid hitting the limit.
+All tickets which are not freed by
+.Dv DIOCXEND
+are released when the program closes
+.Pa /dev/pf .
 .Pp
 Fields of
 .Xr ioctl 2
@@ -132,6 +145,9 @@ for subsequent
 calls and the number
 .Va nr
 of rules in the active ruleset.
+The ticket should be released by the
+.Dv DIOCXEND
+command.
 .It Dv DIOCGETRULE Fa "struct pfioc_rule *pr"
 Get a
 .Va rule
@@ -792,6 +808,10 @@ inactive rulesets since the last
 .Dv DIOCXBEGIN .
 .Dv DIOCXROLLBACK
 will silently ignore rulesets for which the ticket is invalid.
+.It Dv DIOCXEND Fa "u_int32_t *ticket"
+Release ticket obtained by the
+.Dv DIOCGETRULES
+command.
 .It Dv DIOCSETHOSTID Fa "u_int32_t *hostid"
 Set the host ID, which is used by
 .Xr pfsync 4



Re: pf(4) should mention DIOCXEND

2023-07-05 Thread Alexandr Nedvedicky
Hello Jason,

thank you for taking a look. More comments in-line.
On Tue, Jul 04, 2023 at 09:03:29PM +0100, Jason McIntyre wrote:

> > @@ -48,12 +48,25 @@ and retrieve statistics.
> >  The most commonly used functions are covered by
> >  .Xr pfctl 8 .
> >  .Pp
> > -Manipulations like loading a ruleset that involve more than a single
> > +Operations like loading or reading a ruleset that involve more than a 
> > single
> 
> you probably don;t need to add "or reading", since you already indicate
> that it is just an example ("like"), not an exhaustive list. or is there
> a specific reason to list reading a ruleset?

I opted for: 'Operations loading or reading a ruleset...'
the thing is I do really want 'reading' to appear there.
previously we did not need to do anything after reading
of rules was done. It's different now, because in some
cases (snmpd et. al.) we must call DIOCXEND on ticket.

> 
> >  .Xr ioctl 2
> >  call require a so-called
> >  .Em ticket ,
> 
> should probably be Sy rather than Em, but don;t sweat it if such a
> change would make the rest of the manual inconsistent.

looks like we can go for 'Sy' here, because this was the
only occurrence of '.Em ticket'. Later we use '.Va ticket',
but this is in context where ioctl(2) operations are described.
> 
> > -which prevents the occurrence of
> > -multiple concurrent manipulations.
> > +which allows
> > +.Xr pf 4
> > +to deal with concurrent operations.
> > +For certain
> > +.Xr ioctl 2
> > +commands (currently
> > +.Dv DIOCGETRULES )
> > +the number of tickets application can obtain is limited.
> 
> i'm not sure what this means. tickets per application? "tickets
> application" does not read correctly.

may be if I say:
'the number of tickets each program can obtain is limited.'
what I want to say is there is a limit on tickets which
each program can get DIOCGETRULES. To avoid hitting the limit
program must use DIOCXEND to return ticket back to system
when multi-ioctl operation is done.
> 
> > +The application must explicitly release the ticket using
> 
> s/using/using the/

will use 'using the'

> > +.Dv DIOCXEND
> > +command to avoid hitting the limit.
> > +All tickets which are not freed by
> > +.Dv DIOCXEND
> > +are released when application closes
> 
> s/application/the application/

opting for 'the program' to be consistent with new update.

> 
> > +.Pa /dev/pf .
> >  .Pp
> >  Fields of
> >  .Xr ioctl 2
> > @@ -132,6 +145,9 @@ for subsequent
> >  calls and the number
> >  .Va nr
> >  of rules in the active ruleset.
> > +The ticket should be released by
> 
> s/by/by the/
> 

opting for 'released by the'

> or maybe just "released by DIOCXEND".
> 
> > +.Dv DIOCXEND
> > +command.
> >  .It Dv DIOCGETRULE Fa "struct pfioc_rule *pr"
> >  Get a
> >  .Va rule
> > @@ -792,6 +808,10 @@ inactive rulesets since the last
> >  .Dv DIOCXBEGIN .
> >  .Dv DIOCXROLLBACK
> >  will silently ignore rulesets for which the ticket is invalid.
> > +.It Dv DIOCXEND Fa "u_int32_t *ticket"
> > +Release ticket obtained by
> > +.Dv DIOCGETRULES
> > +command.
> 
> again, either "by the XXX command" or "by XXX".

going for 'by the XXX command'

thanks for your help to put my update to pf(4) to shape.
updated diff is below.

regards
sashan

8<---8<---8<--8<
diff --git a/lib/libcrypto/ecdsa/ecs_ossl.c b/lib/libcrypto/ecdsa/ecs_ossl.c
index 0ca2651f255..4bc77a49204 100644
--- a/lib/libcrypto/ecdsa/ecs_ossl.c
+++ b/lib/libcrypto/ecdsa/ecs_ossl.c
@@ -1,4 +1,4 @@
-/* $OpenBSD: ecs_ossl.c,v 1.71 2023/07/04 15:09:31 tb Exp $ */
+/* $OpenBSD: ecs_ossl.c,v 1.68 2023/07/04 10:53:42 tb Exp $ */
 /*
  * Written by Nils Larsch for the OpenSSL project
  */
@@ -122,11 +122,6 @@ ossl_ecdsa_sign(int type, const unsigned char *digest, int 
digest_len,
return ret;
 }
 
-/*
- * FIPS 186-5, section 6.4.1, steps 3-8 and 11: Generate k, calculate r and
- * kinv, and clear it. If r == 0, try again with a new random k.
- */
-
 int
 ossl_ecdsa_sign_setup(EC_KEY *key, BN_CTX *in_ctx, BIGNUM **out_kinv,
 BIGNUM **out_r)
@@ -198,9 +193,7 @@ ossl_ecdsa_sign_setup(EC_KEY *key, BN_CTX *in_ctx, BIGNUM 
**out_kinv,
!BN_set_bit(x, order_bits))
goto err;
 
-   /* Step 11: repeat until r != 0. */
do {
-   /* Step 3: generate random k. */
if (!bn_rand_interval(k, BN_value_one(), order)) {
ECDSAerror(ECDSA_R_RANDOM_NUMBER_GENERATION_FAILED);
goto err;
@@ -227,25 +220,22 @@ ossl_ecdsa_sign_setup(EC_KEY *key, BN_CTX *in_ctx, BIGNUM 
**out_kinv,
 
BN_set_flags(k, BN_FLG_CONSTTIME);
 
-   /* Step 5: P = k * G. */
+   /* Compute r, the x-coordinate of G * k. */
if (!EC_POINT_mul(group, point, k, NULL, NULL, ctx)) {
ECDSAerror(ERR_R_EC_LIB);
goto err;

pf(4) should mention DIOCXEND

2023-07-04 Thread Alexandr Nedvedicky
Hello,

diff below updates pf(4) manpage to reflect changes [1] which
were committed earlier today.

does update to pf(4) read OK?

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-cvs=168848058603797=2
https://marc.info/?l=openbsd-cvs=168847042626997=2

8<---8<---8<--8<

diff --git a/share/man/man4/pf.4 b/share/man/man4/pf.4
index 92eeb45f657..305c536b137 100644
--- a/share/man/man4/pf.4
+++ b/share/man/man4/pf.4
@@ -48,12 +48,25 @@ and retrieve statistics.
 The most commonly used functions are covered by
 .Xr pfctl 8 .
 .Pp
-Manipulations like loading a ruleset that involve more than a single
+Operations like loading or reading a ruleset that involve more than a single
 .Xr ioctl 2
 call require a so-called
 .Em ticket ,
-which prevents the occurrence of
-multiple concurrent manipulations.
+which allows
+.Xr pf 4
+to deal with concurrent operations.
+For certain
+.Xr ioctl 2
+commands (currently
+.Dv DIOCGETRULES )
+the number of tickets application can obtain is limited.
+The application must explicitly release the ticket using
+.Dv DIOCXEND
+command to avoid hitting the limit.
+All tickets which are not freed by
+.Dv DIOCXEND
+are released when application closes
+.Pa /dev/pf .
 .Pp
 Fields of
 .Xr ioctl 2
@@ -132,6 +145,9 @@ for subsequent
 calls and the number
 .Va nr
 of rules in the active ruleset.
+The ticket should be released by
+.Dv DIOCXEND
+command.
 .It Dv DIOCGETRULE Fa "struct pfioc_rule *pr"
 Get a
 .Va rule
@@ -792,6 +808,10 @@ inactive rulesets since the last
 .Dv DIOCXBEGIN .
 .Dv DIOCXROLLBACK
 will silently ignore rulesets for which the ticket is invalid.
+.It Dv DIOCXEND Fa "u_int32_t *ticket"
+Release ticket obtained by
+.Dv DIOCGETRULES
+command.
 .It Dv DIOCSETHOSTID Fa "u_int32_t *hostid"
 Set the host ID, which is used by
 .Xr pfsync 4



Re: huge pfsync rewrite

2023-07-03 Thread Alexandr Nedvedicky
Hello,

I went through the recent diff one more time. I could not spot
anything wrong. Also my home router was happy with it for
quite some time. Unfortunately I'm not using pfsync.
However I'm sure hrvoje@ done his best to try to make it
to crash and no luck diff survived.

Having said earlier it would be more risky if dlg@ will slice
this chunk to smaller diffs it is the best to commit the
whole change.


OK sashan



Re: huge pfsync rewrite

2023-07-02 Thread Alexandr Nedvedicky
Hello,

On Thu, Jun 29, 2023 at 01:48:27PM +1000, David Gwynne wrote:
> On Mon, Jun 26, 2023 at 01:16:40AM +0200, Alexandr Nedvedicky wrote:
> 
> > net/if_pfsync.c
> > the diff currently uses two slices (PFSYNC_NSLICES). is there a plan to
> > scale it up?  the slice can be simply viewed as a kind of task. IMO the
> > number of slices can be aligned with number of cpu cores. Or is this
> > too simplified? I'm just trying to get some hints on how to further
> > tune performance.
> 
> that's part of a bigger discussion which involves how far we should
> scale the number of nettqs and how parallel pf can go.
> 
> 2 slices demonstrates that pfsync can partition work and is safe doing
> so. there kstats ive added on those slices show there isnt a lot of
> contention in pfsync. yet.
> 

I was just wondering, because if I remember correct hrvoje@ has noticed
small performance degradation (compared with current). I think his test was
using 4 net tasks to forward packets through firewall.  now if there are
just 2 tasks for pfsync, then this might be a way how the degradation
sneaked in. just a thought.

thanks and
regards
sashan



Re: huge pfsync rewrite

2023-06-25 Thread Alexandr Nedvedicky
Hello,

The new code looks much better. It's huge leap forward.  I don't mind if this
big diff will get committed. Hrvoje did test the whole diff. Trying to split it
to smaller changes might bring in code which is not tested enough.  We don't
know how individual pieces work independently.

I've got some questions/comments which are perhaps worth to sort out
before commit.

the diff touches those files:

net/if.c
- no objections/comments

net/ifq.c
- no objections/comments

net/netisr.h
- no objections/comments

net/pf_norm.c
- no objections/comments

net/pf_ioctl.c
- no objections/comments

net/pfvar.h
- no objections/comments

net/pfvar_priv.h
- no objections/comments

netinet/ip_ipsp.h
- no objections/comments

netinet/in_proto.c
- just wondering why we rename/introducing pfsync_input4().
  is there a plan to make pfsync(4) to work over IPv6?

net/pf.c
state expiration is overhauled, new code looks better.
once new code will settle down we might need to revisit 
'set timeout interval' description in pf.conf(5),
because if I understand the new code right, the state
expiration timer runs every second now. We might also
consider to expose `pf_purge_states_collect` as a new
tuning knob to control state expiration. Anyway this
can be always done as a follow up change.

- function  pf_state_export() line 1319 comment XXX
1319 if (READ_ONCE(st->sync_defer) != NULL) /* XXX */
1320 sp->state_flags |= htons(PFSTATE_ACK);
  what does it stand for?

- function pf_purge_expired_states(): handling of static
  variable `cur` is unsafe w.r.t. pf_state_list. However
  the same issue is present in current. I think there is
  a diff floating around which makes cur member of
  pf_state_list structure.

net/if_pfsync.h
new if_pfsycnc changes codes for sync_state. This makes me wonder
if we should also update PFSYNC_VERSION. Just to avoid some
undesired/undefined behavior if only one firewall node in cluster gets
upgraded.

net/if_pfsync.c
the diff currently uses two slices (PFSYNC_NSLICES). is there a plan to
scale it up?  the slice can be simply viewed as a kind of task. IMO the
number of slices can be aligned with number of cpu cores. Or is this
too simplified? I'm just trying to get some hints on how to further
tune performance.

I noticed there are few dances with NET_LOCK() (UNLOCK()/LOCK())
in up() and down() operations. I feel this comes from fact we
use NET_LOCK() to also protect state of pfsync(4) itself. I wounder
if we can make our life easier here with dedicated rw-lock similar
to pfioctl_rw we have in pf(4). The dedicated rw-lock would make
sure there is no other ioctl() operation intervening with us.
this can be certainly left for follow up change.

- I think lines 102-108 and 110 can be removed:
 102 #if 0
 103 #define DPRINTF(_fmt...) do {   \
 104 printf("%s[%u]: ", __func__, __LINE__); \
 105 printf(_fmt);   \
 106 printf("\n");   \
 107 } while (0)
 108 #else

- function pfsync_clone_create(), line 384:
 381 pool_init(_deferrals_pool,
 382 sizeof(struct pfsync_deferral), 0,
 383 IPL_MPFLOOR, 0, "pfdefer", NULL);
 384 /* pool_cache_init(_deferrals_pool); */
  does it mean pool_cache_init() is not ready/well tested yet?

- function pfsync_clone_create(), comments /* grumble ... */
  on lines 401, and 405, what is that? Are those related
  to line 440: /* stupid NET_LOCK */ is it about using time out
  to deal with NET_LOCK()? like avoid recursion on NET_LOCK()?

- function pfsync_sendout() line 1539, comment /* XXX */
  what does it stand for?

- function pfsync_clear_states() is it complete? it seems
  to me it has no effect (apart from exercise with reference)

I have not spot anything else.

thanks and
regards
sashan




Re: Possible typo in pf NAT FAQ

2023-06-19 Thread Alexandr Nedvedicky
Hello,

On Sun, Jun 18, 2023 at 06:29:28PM -0600, Ashlen wrote:
> On Sun, 18 Jun 2023 20:35 +0200, Stephan Neuhaus wrote:
> > Hi list
> > 
> > I think I have found a typo in the pf NAT FAQ here:
> > https://www.openbsd.org/faq/pf/nat.html. In the
> > "Configuring NAT" section it says:
> > 
> >   The general format in pf.conf looks something like this:
> > 
> >   match out on interface [af] \
> >  from src_addr to dst_addr \
> >  nat-to ext_addr [pool_type] [static-port]
> >   [...]
> >   pass out [log] on interface [af] [proto protocol] \
> >  from ext_addr [port src_port] \
> >  to dst_addr [port dst_port]
> > 
> > As you can see, the pass rule says "from ext_addr".
> > But beneath the description of the various options, it
> > says:
> > 
> >   This would lead to a most basic form of these lines similar to this:
> > 
> >   match out on tl0 from 192.168.1.0/24 to any nat-to 198.51.100.1
> >   pass on tl0 from 192.168.1.0/24 to any
> > 
> > Here you can see that the "from" part is what the
> > above description calls the src_addr, not the
> > ext_addr, as it claims. This makes much more sense and
> > is consistent with all the other documentation that
> > I've seen.
> > 
> > So could it be a typo in the docs? Or have I missed some things?
> > 
> > Thanks in advance
> > 
> > Stephan
> > 
> 
> That looks like a typo to me. Changing CC to include tech@ and drop
> misc@ (I've been told it's bad form to send patches on misc@).

yes it is a typo, your diff looks OK to me.

thanks and
regards
sashan

> Index: nat.html
> ===
> RCS file: /cvs/www/faq/pf/nat.html,v
> retrieving revision 1.79
> diff -u -p -r1.79 nat.html
> --- nat.html  12 May 2021 15:13:25 -  1.79
> +++ nat.html  19 Jun 2023 00:18:15 -
> @@ -166,7 +166,7 @@ match out on interface [af] \
> nat-to ext_addr [pool_type] [static-port]
>  [...]
>  pass out [log] on interface [af] [proto protocol] \
> -   from ext_addr [port src_port] \
> +   from src_addr [port src_port] \
> to dst_addr [port dst_port]
>  
>  



Re: pfioctl: drop net lock from SIOC{S,G}LIMIT

2023-05-25 Thread Alexandr Nedvedicky
Hello,

On Thu, May 25, 2023 at 07:14:54AM +, Klemens Nanni wrote:
> On Thu, May 25, 2023 at 03:28:45AM +, Klemens Nanni wrote:
> > On Thu, May 25, 2023 at 03:20:04AM +, Klemens Nanni wrote:
> > > pfsync_in_bus() looks like the only place where the static array
> > > pf_pool_limits[] is accessed without the pf lock, so grab it there.
> > > 
> > > Limits themselves are protected by the pf lock and pool(9)s are never
> > > destroyed and have builtint per-pool locks, so the net lock is not
> > > needed.
> > > 
> > > (pf_pool_limits[] access in DIOCXCOMMIT remains under pf *and net* lock
> > >  until the rest in there gets pulled out of the net lock.)
> > > 
> > > Feedback? OK?
> > 
> > Correct diff without typo and with missing locking comment.
> 
> Diffing pfvar.h instead of pfvar_priv.h helps to get the comment hunk...

looks good to me.

OK sashan



pfsync(4) must protect state with mutex

2023-05-15 Thread Alexandr Nedvedicky
Hello,

diff below has been posted to bugs@ ~week [1] ago. Diff fixes long lasting
issue I was scratching my head around for several months. 

Recently I've shared my itch with bluhm@ who pointed out that `sc_st_mtx`
mutex is not sufficient protection. We also need to grab pf_state::mtx
when inserting/removing/moving particular state on sync queues.

diff below makes sure pfsync always grab a pf_state::mtx when it
moves state around its queues.

Hrvoje confirms diff fixes panic he was able to induce on his
test bed under stress load.

OK to commit?

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-bugs=168367101726447=2

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index 822b4211d0f..811d9d59666 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -1362,14 +1362,17 @@ pfsync_grab_snapshot(struct pfsync_snapshot *sn, struct 
pfsync_softc *sc)
 
while ((st = TAILQ_FIRST(>sc_qs[q])) != NULL) {
TAILQ_REMOVE(>sc_qs[q], st, sync_list);
+   mtx_enter(>mtx);
if (st->snapped == 0) {
TAILQ_INSERT_TAIL(>sn_qs[q], st, sync_snap);
st->snapped = 1;
+   mtx_leave(>mtx);
} else {
/*
 * item is on snapshot list already, so we can
 * skip it now.
 */
+   mtx_leave(>mtx);
pf_state_unref(st);
}
}
@@ -1422,11 +1425,13 @@ pfsync_drop_snapshot(struct pfsync_snapshot *sn)
continue;
 
while ((st = TAILQ_FIRST(>sn_qs[q])) != NULL) {
+   mtx_enter(>mtx);
KASSERT(st->sync_state == q);
KASSERT(st->snapped == 1);
TAILQ_REMOVE(>sn_qs[q], st, sync_snap);
st->sync_state = PFSYNC_S_NONE;
st->snapped = 0;
+   mtx_leave(>mtx);
pf_state_unref(st);
}
}
@@ -1665,6 +1670,7 @@ pfsync_sendout(void)
 
count = 0;
while ((st = TAILQ_FIRST(_qs[q])) != NULL) {
+   mtx_enter(>mtx);
TAILQ_REMOVE(_qs[q], st, sync_snap);
KASSERT(st->sync_state == q);
KASSERT(st->snapped == 1);
@@ -1672,6 +1678,7 @@ pfsync_sendout(void)
st->snapped = 0;
pfsync_qs[q].write(st, m->m_data + offset);
offset += pfsync_qs[q].len;
+   mtx_leave(>mtx);
 
pf_state_unref(st);
count++;
@@ -1725,8 +1732,6 @@ pfsync_insert_state(struct pf_state *st)
ISSET(st->state_flags, PFSTATE_NOSYNC))
return;
 
-   KASSERT(st->sync_state == PFSYNC_S_NONE);
-
if (sc->sc_len == PFSYNC_MINPKT)
timeout_add_sec(>sc_tmo, 1);
 
@@ -2221,6 +2226,7 @@ pfsync_q_ins(struct pf_state *st, int q)
panic("pfsync pkt len is too low %zd", sc->sc_len);
do {
mtx_enter(>sc_st_mtx);
+   mtx_enter(>mtx);
 
/*
 * There are either two threads trying to update the
@@ -2228,6 +2234,7 @@ pfsync_q_ins(struct pf_state *st, int q)
 * (is on snapshot queue).
 */
if (st->sync_state != PFSYNC_S_NONE) {
+   mtx_leave(>mtx);
mtx_leave(>sc_st_mtx);
break;
}
@@ -2240,6 +2247,7 @@ pfsync_q_ins(struct pf_state *st, int q)
sclen = atomic_add_long_nv(>sc_len, nlen);
if (sclen > sc->sc_if.if_mtu) {
atomic_sub_long(>sc_len, nlen);
+   mtx_leave(>mtx);
mtx_leave(>sc_st_mtx);
pfsync_sendout();
continue;
@@ -2249,6 +2257,7 @@ pfsync_q_ins(struct pf_state *st, int q)
 
TAILQ_INSERT_TAIL(>sc_qs[q], st, sync_list);
st->sync_state = q;
+   mtx_leave(>mtx);
mtx_leave(>sc_st_mtx);
} while (0);
 }
@@ -2260,6 +2269,7 @@ pfsync_q_del(struct pf_state *st)
int q;
 
mtx_enter(>sc_st_mtx);
+   mtx_enter(>mtx);
q = st->sync_state;
/*
 * re-check under mutex
@@ -2267,6 +2277,7 @@ pfsync_q_del(struct pf_state *st)
 * too late, the state is being just processed/dispatched to peer.
 */
if ((q == PFSYNC_S_NONE) || (st->snapped)) {
+   mtx_leave(>mtx);

Re: hardware TSO TCP/IP layer

2023-05-15 Thread Alexandr Nedvedicky
Hello,

I see tcp_dochopper() got committed already...

anyway changes here look good. I have no objection.

OK sashan



Re: give softnet threads their own names

2023-05-12 Thread Alexandr Nedvedicky
On Fri, May 12, 2023 at 10:34:00AM +1000, David Gwynne wrote:
> so in top you see softnet0, softnet1, etc.
> 
> the real change is putting a struct softnet in place to wrap the name
> and taskq up with.
> 
> ok?

sure, thanks
sashan



Re: pfioctl: drop net lock from DIOCOSFP{FLUSH,ADD,GET}

2023-05-07 Thread Alexandr Nedvedicky
Hello,

On Sat, May 06, 2023 at 08:56:06PM +, Klemens Nanni wrote:
> On Sat, May 06, 2023 at 09:33:05PM +0200, Alexander Bluhm wrote:
> > On Sat, May 06, 2023 at 11:11:25AM +, Klemens Nanni wrote:
> > > pf_osfp.c contains all the locking for these three ioctls, this removes
> > > the net lock from it.
> > > 
> > > All data is protected by the pf lock, new asserts verify that.
> > > 
> > > Beside the pf ioctl handler, pf_match_rule()'s call to pf_osfp_match()
> > > is the only hook into it.
> > > 
> > > tcpbump still compiles pf_osfp.c without object change.
> > > 
> > > Feedback? OK?
> > 
> > Could you document that pf_osfp_list and fp_next is protected by
> > pf lock?
> 
> Picked [p] for pf_lock so as to not collide with pfvar_priv.h's comment
> about "Protection/owernership of pf_state members".
> 
> > There is nothing in pf_osfp_fingerprint() that needs pf lock directly.
> > The critical access is the SLIST_FOREACH in pf_osfp_find().  Place
> > the PF_ASSERT_LOCKED() there.
> 
> Misplaced, thanks.
> 
> > Why is pf_osfp_list passed to pf_osfp_find() and pf_osfp_find_exact()?
> > Can we remove the parameter and just use pf_osfp_list?
> 
> Wanted to clean this up in a separate diff, but we might as well do
> everything in one go.
> 
> 

looks OK to me

sashan



Re: fragment code cleanup for tso

2023-05-07 Thread Alexandr Nedvedicky
Hello,

On Fri, May 05, 2023 at 11:17:50PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> I preparation for my TSO in software diff, I would like to cleanup
> the fragment code.  Both are very simmilar and I like consistency.
> 
> - Use if_output_ml() to send mbuf lists to interfaces.  This can
>   be used for TSO, fragments, ARP and ND6.
> - Rename variable fml to ml.  It will soon be used for TCP segments,
>   no need to have the f for fragment in the name.
> - In pf_route6() split the if else block, then tcp_chopper() can
>   be easily put in there in the next diff.
> - In ip_fragment() the if (hlen + firstlen < tlen) is a new safety
>   check.  It makes the code correct for the case where the packet
>   was to short to be fragmented.  This should not happen, but the
>   other functions also have this logic.
> 

changes make sense, new code reads better.

> No functional change intended.

I agree I could not spot any side effect.
> 
> ok?
> 

reads OK to me

sashan



Re: pfioctl: DIOCGETRULESET{,S}: drop net lock

2023-05-03 Thread Alexandr Nedvedicky
Hello,

On Sat, Apr 29, 2023 at 01:37:52PM +, Klemens Nanni wrote:
> Both walk the list of rulesets aka. anchors, first one yields a count,
> second yields a specific's anchor name.
> 
> Same data access pattern, different copy out, basically.
> 
> pf_anchor_global are contained within pf_ioctl.c and pf_ruleset.c and
> fully protected by the pf lock.
> 
> Same for pf_main_ruleset and its pf.c usage.
> 
> Running with extra asserts to double check works and handling 60k rules
> an anchor works noticably faster:
> 
>   # jot -w 'pass proto tcp to port ' 6 | pfctl -a test -o none -f -
>   # time pfctl -a test -s r | wc -l
> 6
>   0m02.10s real 0m00.40s user 0m01.70s system
> 
> Dropped from around 3.5s to around 2.0s for me.
> 
> Feedback? OK without asserts?

OK with asserts.

thanks and
regards
sashan



Re: pfioctl: DIOCGETQUEUE: drop net lock

2023-04-28 Thread Alexandr Nedvedicky
Hello,

On Fri, Apr 28, 2023 at 09:02:35PM +, Klemens Nanni wrote:
> Same logic and argument as for the parent *S ioctl, might as well have
> committed them together:
> ---
> Remove net lock from DIOCGETQUEUES
> 
> Both ticket and number of queues stem from the pf_queues_active list which
> is effectively static to pf_ioctl.c and fully protected by the pf lock.
> ---
> 
> OK?

looks good

OK sashan
> 
> 
> Index: pf_ioctl.c
> ===
> RCS file: /cvs/src/sys/net/pf_ioctl.c,v
> retrieving revision 1.401
> diff -u -p -U5 -r1.401 pf_ioctl.c
> --- pf_ioctl.c28 Apr 2023 14:08:38 -  1.401
> +++ pf_ioctl.c28 Apr 2023 20:52:54 -
> @@ -1229,32 +1229,28 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a
>   case DIOCGETQUEUE: {
>   struct pfioc_queue  *pq = (struct pfioc_queue *)addr;
>   struct pf_queuespec *qs;
>   u_int32_tnr = 0;
>  
> - NET_LOCK();
>   PF_LOCK();
>   if (pq->ticket != pf_main_ruleset.rules.active.version) {
>   error = EBUSY;
>   PF_UNLOCK();
> - NET_UNLOCK();
>   goto fail;
>   }
>  
>   /* save state to not run over them all each time? */
>   qs = TAILQ_FIRST(pf_queues_active);
>   while ((qs != NULL) && (nr++ < pq->nr))
>   qs = TAILQ_NEXT(qs, entries);
>   if (qs == NULL) {
>   error = EBUSY;
>   PF_UNLOCK();
> - NET_UNLOCK();
>   goto fail;
>   }
>   memcpy(>queue, qs, sizeof(pq->queue));
>   PF_UNLOCK();
> - NET_UNLOCK();
>   break;
>   }
>  
>   case DIOCGETQSTATS: {
>   struct pfioc_qstats *pq = (struct pfioc_qstats *)addr;
> 



Re: DIOCGETRULE is slow for large rulesets (complete diff)

2023-04-28 Thread Alexandr Nedvedicky
Hello,

On Fri, Apr 28, 2023 at 11:18:18AM +, Klemens Nanni wrote:
> On Fri, Apr 28, 2023 at 12:55:37PM +0200, Alexandr Nedvedicky wrote:
> > Hello,
> > [ sending off-list ]
> > 
> > up-to-date complete diff.
> 
> Builds, boots and works faster:
> 
>   $ time jot -w 'pass proto tcp to port ' $((20 * 1000)) 1024 | pfctl 
> -vnf- -onone
>   ...
>   0m02.62s real 0m00.55s user 0m02.06s system
> 
> Before it took about 10s real time.
> 
> OK kn, nits inline.
> 

will commit the whole diff.


> > @@ -968,6 +953,13 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
> > pfctl_show format,
> > case PFCTL_SHOW_NOTHING:
> > break;
> > }
> > +   errno = 0;
> > +   }
> > +
> > +   if (errno != 0 && errno != ENOENT) {
> 
> I'd drop ' != 0', it is an int and almost always checked as boolean like this.
> 

I'll keep code as-is. Just to keep both checks
in consistent shape. I don't like appearance of

if (errno && errno != ENOENT)


I'll address all other nits pointed out by Klemens.

thanks and
regards
sashan



Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-27 Thread Alexandr Nedvedicky
Hello,

Hello,

On Thu, Apr 27, 2023 at 10:41:53AM +1000, David Gwynne wrote:

> 
> t could be NULL here. just do the unit check inside the loop?
> 
> > 
> > if (t->pft_unit != unit)
> > return (NULL);
> > 
> > return (t);
> > }
> > 
> > just return NULL on unit mismatch.  updated diff is below.
> 
> ok once you fix the nit above.
> 

well spotted. This is the change I made to get 2/3 patch
fixed:

diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index ebe1b912766..b527610737b 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -3291,13 +3291,10 @@ pf_find_trans(uint32_t unit, uint64_t ticket)
rw_assert_anylock(_rw);
 
LIST_FOREACH(t, _ioctl_trans, pft_entry) {
-   if (t->pft_ticket == ticket)
+   if (t->pft_ticket == ticket && t->pft_unit == unit)
break;
}
 
-   if (t->pft_unit != unit)
-   return (NULL);
-
return (t);
 }


thanks and
regards
sashan



Re: DIOCGETRULE is slow for large rulesets (3/3)

2023-04-26 Thread Alexandr Nedvedicky
Hello,

updated diff below adopts to recent changes to pf_find_trans() in diff 2/3 [1]

in case DIOCGETRULE does not find desired transaction the ioctl(2) call
to /dev/pf returns ENXIO.

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-tech=168254555211083=2

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index 8cbd9d77b4f..3787e97a6b1 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -837,7 +837,7 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
 char *anchorname, int depth, int wildcard, long shownr)
 {
struct pfioc_rule pr;
-   u_int32_t nr, mnr, header = 0;
+   u_int32_t header = 0;
int len = strlen(path), ret = 0;
char *npath, *p;
 
@@ -893,24 +893,9 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
goto error;
}
 
-   if (shownr < 0) {
-   mnr = pr.nr;
-   nr = 0;
-   } else if (shownr < pr.nr) {
-   nr = shownr;
-   mnr = shownr + 1;
-   } else {
-   warnx("rule %ld not found", shownr);
-   ret = -1;
-   goto error;
-   }
-   for (; nr < mnr; ++nr) {
-   pr.nr = nr;
-   if (ioctl(dev, DIOCGETRULE, ) == -1) {
-   warn("DIOCGETRULE");
-   ret = -1;
-   goto error;
-   }
+   while (ioctl(dev, DIOCGETRULE, ) != -1) {
+   if ((shownr != -1) && (shownr != pr.nr))
+   continue;
 
/* anchor is the same for all rules in it */
if (pr.rule.anchor_wildcard == 0)
@@ -968,6 +953,13 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
case PFCTL_SHOW_NOTHING:
break;
}
+   errno = 0;
+   }
+
+   if ((errno != 0) && (errno != ENOENT)) {
+   warn("DIOCGETRULE");
+   ret = -1;
+   goto error;
}
 
/*
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index ebe1b912766..0f78c5b12ac 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -122,6 +122,10 @@ struct pf_trans*pf_find_trans(uint32_t, 
uint64_t);
 voidpf_free_trans(struct pf_trans *);
 voidpf_rollback_trans(struct pf_trans *);
 
+voidpf_init_tgetrule(struct pf_trans *,
+   struct pf_anchor *, uint32_t, struct pf_rule *);
+voidpf_cleanup_tgetrule(struct pf_trans *t);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -548,7 +552,7 @@ pf_qid_unref(u_int16_t qid)
 }
 
 int
-pf_begin_rules(u_int32_t *ticket, const char *anchor)
+pf_begin_rules(u_int32_t *version, const char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
@@ -559,20 +563,20 @@ pf_begin_rules(u_int32_t *ticket, const char *anchor)
pf_rm_rule(rs->rules.inactive.ptr, rule);
rs->rules.inactive.rcount--;
}
-   *ticket = ++rs->rules.inactive.ticket;
+   *version = ++rs->rules.inactive.version;
rs->rules.inactive.open = 1;
return (0);
 }
 
 void
-pf_rollback_rules(u_int32_t ticket, char *anchor)
+pf_rollback_rules(u_int32_t version, char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
 
rs = pf_find_ruleset(anchor);
if (rs == NULL || !rs->rules.inactive.open ||
-   rs->rules.inactive.ticket != ticket)
+   rs->rules.inactive.version != version)
return;
while ((rule = TAILQ_FIRST(rs->rules.inactive.ptr)) != NULL) {
pf_rm_rule(rs->rules.inactive.ptr, rule);
@@ -851,7 +855,7 @@ pf_hash_rule(MD5_CTX *ctx, struct pf_rule *rule)
 }
 
 int
-pf_commit_rules(u_int32_t ticket, char *anchor)
+pf_commit_rules(u_int32_t version, char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
@@ -860,7 +864,7 @@ pf_commit_rules(u_int32_t ticket, char *anchor)
 
rs = pf_find_ruleset(anchor);
if (rs == NULL || !rs->rules.inactive.open ||
-   ticket != rs->rules.inactive.ticket)
+   version != rs->rules.inactive.version)
return (EBUSY);
 
if (rs == _main_ruleset)
@@ -875,7 +879,7 @@ pf_commit_rules(u_int32_t ticket, char *anchor)
rs->rules.inactive.ptr = old_rules;
rs->rules.inactive.rcount = old_rcount;
 
-   rs->rules.active.ticket = rs->rules.inactive.ticket;
+   rs->rules.active.version = rs->rules.inactive.version;
pf_calc_skip_steps(rs->rules.active.ptr);
 
 
@@ -1214,7 +1218,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
 
NET_LOCK();
PF_LOCK();
-  

Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread Alexandr Nedvedicky
Hello,

On Thu, Apr 27, 2023 at 06:51:52AM +1000, David Gwynne wrote:

> > is that kind of check in KASSET() something you have on your mind?
> > perhaps I can trade KASSERT() to regular code:
> > 
> > if (t->pft_unit != minor(dev))
> > return (EPERM);
> 
> i would pass the dev/minor/unit to pf_find_trans() and compare along
> with the ticket value as a matter of course. returning a different
> errno if the minor is different is unecessary.
> 

something like this?

struct pf_trans *
pf_find_trans(uint32_t unit, uint64_t ticket)
{
struct pf_trans *t;

rw_assert_anylock(_rw);

LIST_FOREACH(t, _ioctl_trans, pft_entry) {
if (t->pft_ticket == ticket)
break;
}

if (t->pft_unit != unit)
return (NULL);

return (t);
}

just return NULL on unit mismatch.  updated diff is below.


thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 7ea22050506..ebe1b912766 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -117,6 +117,11 @@ voidpf_qid_unref(u_int16_t);
 int pf_states_clr(struct pfioc_state_kill *);
 int pf_states_get(struct pfioc_states *);
 
+struct pf_trans*pf_open_trans(uint32_t);
+struct pf_trans*pf_find_trans(uint32_t, uint64_t);
+voidpf_free_trans(struct pf_trans *);
+voidpf_rollback_trans(struct pf_trans *);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -168,6 +173,8 @@ int  pf_rtlabel_add(struct pf_addr_wrap *);
 voidpf_rtlabel_remove(struct pf_addr_wrap *);
 voidpf_rtlabel_copyout(struct pf_addr_wrap *);
 
+uint64_t trans_ticket = 1;
+LIST_HEAD(, pf_trans)  pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans);
 
 void
 pfattach(int num)
@@ -293,6 +300,25 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
 int
 pfclose(dev_t dev, int flags, int fmt, struct proc *p)
 {
+   struct pf_trans *w, *s;
+   LIST_HEAD(, pf_trans)   tmp_list;
+   uint32_t unit = minor(dev);
+
+   LIST_INIT(_list);
+   rw_enter_write(_rw);
+   LIST_FOREACH_SAFE(w, _ioctl_trans, pft_entry, s) {
+   if (w->pft_unit == unit) {
+   LIST_REMOVE(w, pft_entry);
+   LIST_INSERT_HEAD(_list, w, pft_entry);
+   }
+   }
+   rw_exit_write(_rw);
+
+   while ((w = LIST_FIRST(_list)) != NULL) {
+   LIST_REMOVE(w, pft_entry);
+   pf_free_trans(w);
+   }
+
return (0);
 }
 
@@ -522,7 +548,7 @@ pf_qid_unref(u_int16_t qid)
 }
 
 int
-pf_begin_rules(u_int32_t *version, const char *anchor)
+pf_begin_rules(u_int32_t *ticket, const char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
@@ -533,20 +559,20 @@ pf_begin_rules(u_int32_t *version, const char *anchor)
pf_rm_rule(rs->rules.inactive.ptr, rule);
rs->rules.inactive.rcount--;
}
-   *version = ++rs->rules.inactive.version;
+   *ticket = ++rs->rules.inactive.ticket;
rs->rules.inactive.open = 1;
return (0);
 }
 
 void
-pf_rollback_rules(u_int32_t version, char *anchor)
+pf_rollback_rules(u_int32_t ticket, char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
 
rs = pf_find_ruleset(anchor);
if (rs == NULL || !rs->rules.inactive.open ||
-   rs->rules.inactive.version != version)
+   rs->rules.inactive.ticket != ticket)
return;
while ((rule = TAILQ_FIRST(rs->rules.inactive.ptr)) != NULL) {
pf_rm_rule(rs->rules.inactive.ptr, rule);
@@ -825,7 +851,7 @@ pf_hash_rule(MD5_CTX *ctx, struct pf_rule *rule)
 }
 
 int
-pf_commit_rules(u_int32_t version, char *anchor)
+pf_commit_rules(u_int32_t ticket, char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
@@ -834,7 +860,7 @@ pf_commit_rules(u_int32_t version, char *anchor)
 
rs = pf_find_ruleset(anchor);
if (rs == NULL || !rs->rules.inactive.open ||
-   version != rs->rules.inactive.version)
+   ticket != rs->rules.inactive.ticket)
return (EBUSY);
 
if (rs == _main_ruleset)
@@ -849,7 +875,7 @@ pf_commit_rules(u_int32_t version, char *anchor)
rs->rules.inactive.ptr = old_rules;
rs->rules.inactive.rcount = old_rcount;
 
-   rs->rules.active.version = rs->rules.inactive.version;
+   rs->rules.active.ticket = rs->rules.inactive.ticket;
pf_calc_skip_steps(rs->rules.active.ptr);
 
 
@@ -1142,10 +1168,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)

Re: DIOCGETRULE is slow for large rulesets (3/3)

2023-04-26 Thread Alexandr Nedvedicky
Hello,

On Wed, Apr 26, 2023 at 01:25:45AM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> this is 3rd of three diffs to improve DIOCGETRULE ioctl operation.
> the summary of changes done here is as follows:
> - add new members to pf_trans structure so it can
>   hold data for DIOCGETRULE operation
> 
> - DIOCGETRULES opens transaction and returns ticket/transaction id
>   to pfctl. pfctl uses ticket in DIOCGETRULE to identify transaction
>   when it retrieves the rule
> 
> - change introduces new reference counter to pf_anchor which is used
>   by pf_trans to keep reference to pf_anchor object.
> 
> - DIOCGETRULES opens transaction and stores a reference to anchor
>   with current ruleset version and pointer to the first rule
> 
> - DIOCGETRULE looks up transaction, verifies version number is
>   same so it can safely access next rule.
> 
> - pfctl when handling 'pfctl -sr -R n' must iterate to
>   n-th rule one-by-one.
> 

updating diff to accommodate changes in diff 2/3 [1] in the stack
of changes.

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-tech=168251129621089=2

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index 8cbd9d77b4f..3787e97a6b1 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -837,7 +837,7 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
 char *anchorname, int depth, int wildcard, long shownr)
 {
struct pfioc_rule pr;
-   u_int32_t nr, mnr, header = 0;
+   u_int32_t header = 0;
int len = strlen(path), ret = 0;
char *npath, *p;
 
@@ -893,24 +893,9 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
goto error;
}
 
-   if (shownr < 0) {
-   mnr = pr.nr;
-   nr = 0;
-   } else if (shownr < pr.nr) {
-   nr = shownr;
-   mnr = shownr + 1;
-   } else {
-   warnx("rule %ld not found", shownr);
-   ret = -1;
-   goto error;
-   }
-   for (; nr < mnr; ++nr) {
-   pr.nr = nr;
-   if (ioctl(dev, DIOCGETRULE, ) == -1) {
-   warn("DIOCGETRULE");
-   ret = -1;
-   goto error;
-   }
+   while (ioctl(dev, DIOCGETRULE, ) != -1) {
+   if ((shownr != -1) && (shownr != pr.nr))
+   continue;
 
/* anchor is the same for all rules in it */
if (pr.rule.anchor_wildcard == 0)
@@ -968,6 +953,13 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
case PFCTL_SHOW_NOTHING:
break;
}
+   errno = 0;
+   }
+
+   if ((errno != 0) && (errno != ENOENT)) {
+   warn("DIOCGETRULE");
+   ret = -1;
+   goto error;
}
 
/*
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 8ce0af53a89..d294a8a3b3c 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -122,6 +122,10 @@ struct pf_trans*pf_find_trans(uint64_t);
 voidpf_free_trans(struct pf_trans *);
 voidpf_rollback_trans(struct pf_trans *);
 
+voidpf_init_tgetrule(struct pf_trans *,
+   struct pf_anchor *, uint32_t, struct pf_rule *);
+voidpf_cleanup_tgetrule(struct pf_trans *t);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -1470,7 +1474,9 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
case DIOCGETRULES: {
struct pfioc_rule   *pr = (struct pfioc_rule *)addr;
struct pf_ruleset   *ruleset;
-   struct pf_rule  *tail;
+   struct pf_rule  *rule;
+   struct pf_trans *t;
+   u_int32_truleset_version;
 
NET_LOCK();
PF_LOCK();
@@ -1482,14 +1488,21 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
NET_UNLOCK();
goto fail;
}
-   tail = TAILQ_LAST(ruleset->rules.active.ptr, pf_rulequeue);
-   if (tail)
-   pr->nr = tail->nr + 1;
+   rule = TAILQ_LAST(ruleset->rules.active.ptr, pf_rulequeue);
+   if (rule)
+   pr->nr = rule->nr + 1;
else
pr->nr = 0;
-   pr->ticket = ruleset->rules.active.version;
+   ruleset_version = ruleset->rules.active.version;
+

Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread Alexandr Nedvedicky
Hello,

On Wed, Apr 26, 2023 at 11:51:26AM +, Gerhard Roth wrote:
> On Wed, 2023-04-26 at 13:47 +0200, Alexandr Nedvedicky wrote:

> > @@ -293,6 +300,28 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
> > ??int
> > ??pfclose(dev_t dev, int flags, int fmt, struct proc *p)
> > ??{
> > +??struct pf_trans *w, *s;
> > +??LIST_HEAD(, pf_trans)??tmp_list;
> > +??uint32_t unit = minor(dev);
> > +
> > +??if (minor(dev) >= 1)
> > +??return (ENXIO);
> 
> If you want to use the minor dev-id to release the transaction,
> the above two lines should go away.
> 
> 

yes, you are right. thanks for spotting that.

updated diff is below.

regards
sashan

8<---8<---8<--8<

diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 1141069dcf6..74af2b74ecb 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -117,6 +117,11 @@ voidpf_qid_unref(u_int16_t);
 int pf_states_clr(struct pfioc_state_kill *);
 int pf_states_get(struct pfioc_states *);
 
+struct pf_trans*pf_open_trans(uint32_t);
+struct pf_trans*pf_find_trans(uint64_t);
+voidpf_free_trans(struct pf_trans *);
+voidpf_rollback_trans(struct pf_trans *);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -168,6 +173,8 @@ int  pf_rtlabel_add(struct pf_addr_wrap *);
 voidpf_rtlabel_remove(struct pf_addr_wrap *);
 voidpf_rtlabel_copyout(struct pf_addr_wrap *);
 
+uint64_t trans_ticket = 1;
+LIST_HEAD(, pf_trans)  pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans);
 
 void
 pfattach(int num)
@@ -293,6 +300,25 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
 int
 pfclose(dev_t dev, int flags, int fmt, struct proc *p)
 {
+   struct pf_trans *w, *s;
+   LIST_HEAD(, pf_trans)   tmp_list;
+   uint32_t unit = minor(dev);
+
+   LIST_INIT(_list);
+   rw_enter_write(_rw);
+   LIST_FOREACH_SAFE(w, _ioctl_trans, pft_entry, s) {
+   if (w->pft_unit == unit) {
+   LIST_REMOVE(w, pft_entry);
+   LIST_INSERT_HEAD(_list, w, pft_entry);
+   }
+   }
+   rw_exit_write(_rw);
+
+   while ((w = LIST_FIRST(_list)) != NULL) {
+   LIST_REMOVE(w, pft_entry);
+   pf_free_trans(w);
+   }
+
return (0);
 }
 
@@ -1142,10 +1168,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
return (EACCES);
}
 
-   if (flags & FWRITE)
-   rw_enter_write(_rw);
-   else
-   rw_enter_read(_rw);
+   rw_enter_write(_rw);
 
switch (cmd) {
 
@@ -3022,10 +3045,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
break;
}
 fail:
-   if (flags & FWRITE)
-   rw_exit_write(_rw);
-   else
-   rw_exit_read(_rw);
+   rw_exit_write(_rw);
 
return (error);
 }
@@ -3244,3 +3264,52 @@ pf_sysctl(void *oldp, size_t *oldlenp, void *newp, 
size_t newlen)
 
return sysctl_rdstruct(oldp, oldlenp, newp, , sizeof(pfs));
 }
+
+struct pf_trans *
+pf_open_trans(uint32_t unit)
+{
+   static uint64_t ticket = 1;
+   struct pf_trans *t;
+
+   rw_assert_wrlock(_rw);
+
+   t = malloc(sizeof(*t), M_TEMP, M_WAITOK);
+   memset(t, 0, sizeof(struct pf_trans));
+   t->pft_unit = unit;
+   t->pft_ticket = ticket++;
+
+   LIST_INSERT_HEAD(_ioctl_trans, t, pft_entry);
+
+   return (t);
+}
+
+struct pf_trans *
+pf_find_trans(uint64_t ticket)
+{
+   struct pf_trans *t;
+
+   rw_assert_anylock(_rw);
+
+   LIST_FOREACH(t, _ioctl_trans, pft_entry) {
+   if (t->pft_ticket == ticket)
+   break;
+   }
+
+   return (t);
+}
+
+void
+pf_free_trans(struct pf_trans *t)
+{
+   free(t, M_TEMP, sizeof(*t));
+}
+
+void
+pf_rollback_trans(struct pf_trans *t)
+{
+   if (t != NULL) {
+   rw_assert_wrlock(_rw);
+   LIST_REMOVE(t, pft_entry);
+   pf_free_trans(t);
+   }
+}
diff --git a/sys/net/pfvar_priv.h b/sys/net/pfvar_priv.h
index 38fff6473aa..5af2027733a 100644
--- a/sys/net/pfvar_priv.h
+++ b/sys/net/pfvar_priv.h
@@ -322,6 +322,17 @@ enum {
 
 extern struct cpumem *pf_anchor_stack;
 
+enum pf_trans_type {
+   PF_TRANS_NONE,
+   PF_TRANS_MAX
+};
+
+struct pf_trans {
+   LIST_ENTRY(pf_trans)pft_entry;
+   uint32_tpft_unit;   /* process id */
+   uint64_tpft_ticket;
+   enum pf_trans_type  pft_type;
+};
 extern struct task pf_purge_task;
 extern struct timeout  pf_purge_to;
 



Re: DIOCGETRULE is slow for large rulesets (2/3)

2023-04-26 Thread Alexandr Nedvedicky
Hello,


On Wed, Apr 26, 2023 at 11:37:58AM +1000, David Gwynne wrote:
> >  fail:
> > -   if (flags & FWRITE)
> > -   rw_exit_write(_rw);
> > -   else
> > -   rw_exit_read(_rw);
> > +   rw_exit_write(_rw);
> 
> i dont think having the open mode flags affect whether you take a shared
> or exclusive lock makes sense anyway, so im happy with these bits.

while updating my diff I've noticed pfclose() needs
same change: dropping 'flags & FWRITE' test. This is
fixed i new diff below.


> 
>   int unit = minor(dev);
> 
>   if (unit & ((1 << CLONE_SHIFT) - 1))
>   return (ENXIO);
> 
> this has some ties into the second issue, which is that we shouldn't
> use the pid as an identifier for state across syscalls like this
> in the kernel. the reason is that userland could be weird or buggy
> (or hostile) and fork in between creating a transaction and ending
> a transaction, but it will still have the fd it from from open(/dev/pf).
> instead, we should be using the whole dev_t or the minor number,
> as long as it includes the bits that the cloning infrastructure
> uses, as the identifier.

in new diff I'm using a minor(dev) to identify transaction owner.
> 
> i would also check that the process^Wminor number that created a
> transaction is the same when looking up a transaction in pf_find_trans.

the pf_find_trans() is a dead code in diff below as it is
not being used there. Just to make sure I got things right
so I can update '3/3' diff in stack accordingly. Let's assume
snippet below comes from pfioctl() function

int 

  
pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, struct proc *p) {
...
t = pf_find_trans(ticket);
if (t == NULL)
return (ENXIO);

KASSERT(t->pft_unit == minor(dev));

is that kind of check in KASSET() something you have on your mind?
perhaps I can trade KASSERT() to regular code:

if (t->pft_unit != minor(dev))
return (EPERM);


thank you for pointers to bpf.c updated diff is below.

regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 1141069dcf6..b9904c545c5 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -117,6 +117,11 @@ voidpf_qid_unref(u_int16_t);
 int pf_states_clr(struct pfioc_state_kill *);
 int pf_states_get(struct pfioc_states *);
 
+struct pf_trans*pf_open_trans(uint32_t);
+struct pf_trans*pf_find_trans(uint64_t);
+voidpf_free_trans(struct pf_trans *);
+voidpf_rollback_trans(struct pf_trans *);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -168,6 +173,8 @@ int  pf_rtlabel_add(struct pf_addr_wrap *);
 voidpf_rtlabel_remove(struct pf_addr_wrap *);
 voidpf_rtlabel_copyout(struct pf_addr_wrap *);
 
+uint64_t trans_ticket = 1;
+LIST_HEAD(, pf_trans)  pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans);
 
 void
 pfattach(int num)
@@ -293,6 +300,28 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
 int
 pfclose(dev_t dev, int flags, int fmt, struct proc *p)
 {
+   struct pf_trans *w, *s;
+   LIST_HEAD(, pf_trans)   tmp_list;
+   uint32_t unit = minor(dev);
+
+   if (minor(dev) >= 1)
+   return (ENXIO);
+
+   LIST_INIT(_list);
+   rw_enter_write(_rw);
+   LIST_FOREACH_SAFE(w, _ioctl_trans, pft_entry, s) {
+   if (w->pft_unit == unit) {
+   LIST_REMOVE(w, pft_entry);
+   LIST_INSERT_HEAD(_list, w, pft_entry);
+   }
+   }
+   rw_exit_write(_rw);
+
+   while ((w = LIST_FIRST(_list)) != NULL) {
+   LIST_REMOVE(w, pft_entry);
+   pf_free_trans(w);
+   }
+
return (0);
 }
 
@@ -1142,10 +1171,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
return (EACCES);
}
 
-   if (flags & FWRITE)
-   rw_enter_write(_rw);
-   else
-   rw_enter_read(_rw);
+   rw_enter_write(_rw);
 
switch (cmd) {
 
@@ -3022,10 +3048,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
break;
}
 fail:
-   if (flags & FWRITE)
-   rw_exit_write(_rw);
-   else
-   rw_exit_read(_rw);
+   rw_exit_write(_rw);
 
return (error);
 }
@@ -3244,3 +3267,52 @@ pf_sysctl(void *oldp, size_t *oldlenp, void *newp, 
size_t newlen)
 
return sysctl_rdstruct(oldp, oldlenp, newp, , sizeof(pfs));
 }
+

DIOCGETRULE is slow for large rulesets (3/3)

2023-04-25 Thread Alexandr Nedvedicky
Hello,

this is 3rd of three diffs to improve DIOCGETRULE ioctl operation.
the summary of changes done here is as follows:
- add new members to pf_trans structure so it can
  hold data for DIOCGETRULE operation

- DIOCGETRULES opens transaction and returns ticket/transaction id
  to pfctl. pfctl uses ticket in DIOCGETRULE to identify transaction
  when it retrieves the rule

- change introduces new reference counter to pf_anchor which is used
  by pf_trans to keep reference to pf_anchor object.

- DIOCGETRULES opens transaction and stores a reference to anchor
  with current ruleset version and pointer to the first rule

- DIOCGETRULE looks up transaction, verifies version number is
  same so it can safely access next rule.

- pfctl when handling 'pfctl -sr -R n' must iterate to
  n-th rule one-by-one.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index 8cbd9d77b4f..3787e97a6b1 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -837,7 +837,7 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
 char *anchorname, int depth, int wildcard, long shownr)
 {
struct pfioc_rule pr;
-   u_int32_t nr, mnr, header = 0;
+   u_int32_t header = 0;
int len = strlen(path), ret = 0;
char *npath, *p;
 
@@ -893,24 +893,9 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
goto error;
}
 
-   if (shownr < 0) {
-   mnr = pr.nr;
-   nr = 0;
-   } else if (shownr < pr.nr) {
-   nr = shownr;
-   mnr = shownr + 1;
-   } else {
-   warnx("rule %ld not found", shownr);
-   ret = -1;
-   goto error;
-   }
-   for (; nr < mnr; ++nr) {
-   pr.nr = nr;
-   if (ioctl(dev, DIOCGETRULE, ) == -1) {
-   warn("DIOCGETRULE");
-   ret = -1;
-   goto error;
-   }
+   while (ioctl(dev, DIOCGETRULE, ) != -1) {
+   if ((shownr != -1) && (shownr != pr.nr))
+   continue;
 
/* anchor is the same for all rules in it */
if (pr.rule.anchor_wildcard == 0)
@@ -968,6 +953,13 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
case PFCTL_SHOW_NOTHING:
break;
}
+   errno = 0;
+   }
+
+   if ((errno != 0) && (errno != ENOENT)) {
+   warn("DIOCGETRULE");
+   ret = -1;
+   goto error;
}
 
/*
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 7e4c7d2a2ab..50286dbcb96 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -122,6 +122,10 @@ struct pf_trans*pf_find_trans(uint64_t);
 voidpf_free_trans(struct pf_trans *);
 voidpf_rollback_trans(struct pf_trans *);
 
+voidpf_init_tgetrule(struct pf_trans *,
+   struct pf_anchor *, uint32_t, struct pf_rule *);
+voidpf_cleanup_tgetrule(struct pf_trans *t);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -1474,7 +1478,9 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
case DIOCGETRULES: {
struct pfioc_rule   *pr = (struct pfioc_rule *)addr;
struct pf_ruleset   *ruleset;
-   struct pf_rule  *tail;
+   struct pf_rule  *rule;
+   struct pf_trans *t;
+   u_int32_truleset_version;
 
NET_LOCK();
PF_LOCK();
@@ -1486,14 +1492,21 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
NET_UNLOCK();
goto fail;
}
-   tail = TAILQ_LAST(ruleset->rules.active.ptr, pf_rulequeue);
-   if (tail)
-   pr->nr = tail->nr + 1;
+   rule = TAILQ_LAST(ruleset->rules.active.ptr, pf_rulequeue);
+   if (rule)
+   pr->nr = rule->nr + 1;
else
pr->nr = 0;
-   pr->ticket = ruleset->rules.active.version;
+   ruleset_version = ruleset->rules.active.version;
+   pf_anchor_take(ruleset->anchor);
+   rule = TAILQ_FIRST(ruleset->rules.active.ptr);
PF_UNLOCK();
NET_UNLOCK();
+
+   t = pf_open_trans(p->p_p->ps_pid);
+   pf_init_tgetrule(t, ruleset->anchor, ruleset_version, rule);
+   pr->ticket = t->pft_ticket;
+
break;
}
 
@@ -1501,29 +1514,29 @@ pfioctl(dev_t dev, u_long cmd, 

DIOCGETRULE is slow for large rulesets (2/3)

2023-04-25 Thread Alexandr Nedvedicky
Hello,

This is the second diff. It introduces a transaction (pf_trans).
It's more or less diff with dead code.

It's still worth to note those two chunks in this diff:

@@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
return (EACCES);
}
 
-   if (flags & FWRITE)
-   rw_enter_write(_rw);
-   else
-   rw_enter_read(_rw);
+   rw_enter_write(_rw);
 
switch (cmd) {
 
@@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
break;
}
 fail:
-   if (flags & FWRITE)
-   rw_exit_write(_rw);
-   else
-   rw_exit_read(_rw);
+   rw_exit_write(_rw);
 
`pfioctl_rw` serializes processes which perform ioctl(2) to pf(4).
I'd like to also this lock to serialize access to table of transactions.
To keep things simple for now I'd like to make every process to perform
ioctl(2) on pf(4) exclusively. I plan to revisit this later when I'll
be pushing operation on pfioctl_rw further down to individual ioctl
operations.

the operations pf_open_trans(), pf_find_trans(), pf_rollback_trans()
introduced in this diff are unused. They will be brought alive in
the 3rd diff.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 7ea22050506..7e4c7d2a2ab 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -117,6 +117,11 @@ voidpf_qid_unref(u_int16_t);
 int pf_states_clr(struct pfioc_state_kill *);
 int pf_states_get(struct pfioc_states *);
 
+struct pf_trans*pf_open_trans(pid_t);
+struct pf_trans*pf_find_trans(uint64_t);
+voidpf_free_trans(struct pf_trans *);
+voidpf_rollback_trans(struct pf_trans *);
+
 struct pf_rule  pf_default_rule, pf_default_rule_new;
 
 struct {
@@ -168,6 +173,8 @@ int  pf_rtlabel_add(struct pf_addr_wrap *);
 voidpf_rtlabel_remove(struct pf_addr_wrap *);
 voidpf_rtlabel_copyout(struct pf_addr_wrap *);
 
+uint64_t trans_ticket = 1;
+LIST_HEAD(, pf_trans)  pf_ioctl_trans = LIST_HEAD_INITIALIZER(pf_trans);
 
 void
 pfattach(int num)
@@ -293,6 +300,29 @@ pfopen(dev_t dev, int flags, int fmt, struct proc *p)
 int
 pfclose(dev_t dev, int flags, int fmt, struct proc *p)
 {
+   struct pf_trans *w, *s;
+   LIST_HEAD(, pf_trans)   tmp_list;
+
+   if (minor(dev) >= 1)
+   return (ENXIO);
+
+   if (flags & FWRITE) {
+   LIST_INIT(_list);
+   rw_enter_write(_rw);
+   LIST_FOREACH_SAFE(w, _ioctl_trans, pft_entry, s) {
+   if (w->pft_pid == p->p_p->ps_pid) {
+   LIST_REMOVE(w, pft_entry);
+   LIST_INSERT_HEAD(_list, w, pft_entry);
+   }
+   }
+   rw_exit_write(_rw);
+
+   while ((w = LIST_FIRST(_list)) != NULL) {
+   LIST_REMOVE(w, pft_entry);
+   pf_free_trans(w);
+   }
+   }
+
return (0);
 }
 
@@ -1142,10 +1172,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
return (EACCES);
}
 
-   if (flags & FWRITE)
-   rw_enter_write(_rw);
-   else
-   rw_enter_read(_rw);
+   rw_enter_write(_rw);
 
switch (cmd) {
 
@@ -3022,10 +3049,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
break;
}
 fail:
-   if (flags & FWRITE)
-   rw_exit_write(_rw);
-   else
-   rw_exit_read(_rw);
+   rw_exit_write(_rw);
 
return (error);
 }
@@ -3244,3 +3268,52 @@ pf_sysctl(void *oldp, size_t *oldlenp, void *newp, 
size_t newlen)
 
return sysctl_rdstruct(oldp, oldlenp, newp, , sizeof(pfs));
 }
+
+struct pf_trans *
+pf_open_trans(pid_t pid)
+{
+   static uint64_t ticket = 1;
+   struct pf_trans *t;
+
+   rw_assert_wrlock(_rw);
+
+   t = malloc(sizeof(*t), M_TEMP, M_WAITOK);
+   memset(t, 0, sizeof(struct pf_trans));
+   t->pft_pid = pid;
+   t->pft_ticket = ticket++;
+
+   LIST_INSERT_HEAD(_ioctl_trans, t, pft_entry);
+
+   return (t);
+}
+
+struct pf_trans *
+pf_find_trans(uint64_t ticket)
+{
+   struct pf_trans *t;
+
+   rw_assert_anylock(_rw);
+
+   LIST_FOREACH(t, _ioctl_trans, pft_entry) {
+   if (t->pft_ticket == ticket)
+   break;
+   }
+
+   return (t);
+}
+
+void
+pf_free_trans(struct pf_trans *t)
+{
+   free(t, M_TEMP, sizeof(*t));
+}
+
+void
+pf_rollback_trans(struct pf_trans *t)
+{
+   if (t != NULL) {
+   rw_assert_wrlock(_rw);
+   

DIOCGETRULE is slow for large rulesets (1/3)

2023-04-25 Thread Alexandr Nedvedicky
Hello,

below is diff which renames ruleset member `ticket` to `version`.
the reason for this is to keep things clean. The word `ticket`
will be used to identify transaction, while newly introduced `version`
identifies change of particular pf object (ruleset).

diff below is simple find/replace. It changes `ticket` to `version`
at pf_ruleset used by kernel I don't want to drag this change to
pfctl.

OK?

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 1141069dcf6..7ea22050506 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -522,7 +522,7 @@ pf_qid_unref(u_int16_t qid)
 }
 
 int
-pf_begin_rules(u_int32_t *ticket, const char *anchor)
+pf_begin_rules(u_int32_t *version, const char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
@@ -533,20 +533,20 @@ pf_begin_rules(u_int32_t *ticket, const char *anchor)
pf_rm_rule(rs->rules.inactive.ptr, rule);
rs->rules.inactive.rcount--;
}
-   *ticket = ++rs->rules.inactive.ticket;
+   *version = ++rs->rules.inactive.version;
rs->rules.inactive.open = 1;
return (0);
 }
 
 void
-pf_rollback_rules(u_int32_t ticket, char *anchor)
+pf_rollback_rules(u_int32_t version, char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
 
rs = pf_find_ruleset(anchor);
if (rs == NULL || !rs->rules.inactive.open ||
-   rs->rules.inactive.ticket != ticket)
+   rs->rules.inactive.version != version)
return;
while ((rule = TAILQ_FIRST(rs->rules.inactive.ptr)) != NULL) {
pf_rm_rule(rs->rules.inactive.ptr, rule);
@@ -825,7 +825,7 @@ pf_hash_rule(MD5_CTX *ctx, struct pf_rule *rule)
 }
 
 int
-pf_commit_rules(u_int32_t ticket, char *anchor)
+pf_commit_rules(u_int32_t version, char *anchor)
 {
struct pf_ruleset   *rs;
struct pf_rule  *rule;
@@ -834,7 +834,7 @@ pf_commit_rules(u_int32_t ticket, char *anchor)
 
rs = pf_find_ruleset(anchor);
if (rs == NULL || !rs->rules.inactive.open ||
-   ticket != rs->rules.inactive.ticket)
+   version != rs->rules.inactive.version)
return (EBUSY);
 
if (rs == _main_ruleset)
@@ -849,7 +849,7 @@ pf_commit_rules(u_int32_t ticket, char *anchor)
rs->rules.inactive.ptr = old_rules;
rs->rules.inactive.rcount = old_rcount;
 
-   rs->rules.active.ticket = rs->rules.inactive.ticket;
+   rs->rules.active.version = rs->rules.inactive.version;
pf_calc_skip_steps(rs->rules.active.ptr);
 
 
@@ -1191,7 +1191,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
 
NET_LOCK();
PF_LOCK();
-   pq->ticket = pf_main_ruleset.rules.active.ticket;
+   pq->ticket = pf_main_ruleset.rules.active.version;
 
/* save state to not run over them all each time? */
qs = TAILQ_FIRST(pf_queues_active);
@@ -1212,7 +1212,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
 
NET_LOCK();
PF_LOCK();
-   if (pq->ticket != pf_main_ruleset.rules.active.ticket) {
+   if (pq->ticket != pf_main_ruleset.rules.active.version) {
error = EBUSY;
PF_UNLOCK();
NET_UNLOCK();
@@ -1243,7 +1243,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
 
NET_LOCK();
PF_LOCK();
-   if (pq->ticket != pf_main_ruleset.rules.active.ticket) {
+   if (pq->ticket != pf_main_ruleset.rules.active.version) {
error = EBUSY;
PF_UNLOCK();
NET_UNLOCK();
@@ -1290,7 +1290,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
 
NET_LOCK();
PF_LOCK();
-   if (q->ticket != pf_main_ruleset.rules.inactive.ticket) {
+   if (q->ticket != pf_main_ruleset.rules.inactive.version) {
error = EBUSY;
PF_UNLOCK();
NET_UNLOCK();
@@ -1386,7 +1386,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
pf_rule_free(rule);
goto fail;
}
-   if (pr->ticket != ruleset->rules.inactive.ticket) {
+   if (pr->ticket != ruleset->rules.inactive.version) {
error = EBUSY;
PF_UNLOCK();
NET_UNLOCK();
@@ -1464,7 +1464,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
pr->nr = tail->nr + 1;
else
pr->nr = 0;
-  

DIOCGETRULE is slow for large rulesets (complete diff)

2023-04-25 Thread Alexandr Nedvedicky
Hello,

below is a complete diff which makes DIOCGETRULE bit more sane.
This is the complete change I'd like to commit. It is also more
convenient for people who want to test the diff.

when running 'pfctl -sr' to show rules pfctl performs several
ioctl(2) calls. The first call is DIOCGETRULES to find out
the number of rules in ruleset. Then it performs 'n' DIOCGETRULE
commands to retrieve all rules from ruleset one-by-one. Let's
look at DIOCGETRULE code to see how bad things are for extremely
large rulesets:

1489 if (pr->ticket != ruleset->rules.active.ticket) {
1490 error = EBUSY;
1491 PF_UNLOCK();
1492 NET_UNLOCK();
1493 goto fail;
1494 }
1495 rule = TAILQ_FIRST(ruleset->rules.active.ptr);
1496 while ((rule != NULL) && (rule->nr != pr->nr))
1497 rule = TAILQ_NEXT(rule, entries);
1498 if (rule == NULL) {
1499 error = EBUSY;
1500 PF_UNLOCK();
1501 NET_UNLOCK();
1502 goto fail;
1503 }

to retrieve n-th rule DIOCGETRULE ioctl iterates over a ruleset
from beginning. This is OK for conventional rulesets. For large
rulesets the delay to retrieve all rules becomes noticable.
to measure it I've decided to create extreme ruleset with 26200 rules.
It took ~10 seconds on current:

current:
pf# time pfctl -sr |wc -l
   26200
0m10.80s real 0m00.38s user 0m10.44s system

when I apply diff below I'm getting significantly better result:

fixed:
pf# time pfctl.test -sr |wc -l
   26200
0m00.49s real 0m00.15s user 0m00.31s system

The proposed fix introduces transaction (pf_trans) to pf(4)
ioctl(2) subsystem. I use transactions in my tree to update
pf(4) configuration, but this part is not ready for review yet.

Process which opens /dev/pf may create one ore more transactions.
Transaction is either closed when process is done with change
or when closes /dev/pf. This should work, because /dev/pf
is cleanable  since last November [1].

DIOCGETRULE operation uses transaction to keep a reference
to ruleset, ruleset version and pointer to next rule to retrieve,
so we no longer need to iterate from beginning to retrieve n-th rule.

when 'pfctl -sr' retrieves the ruleset it first issues DIOCGETRULES
ioctl(2). It receives ticket (transaction id) which must be passed
as parameter to subsequent DIOCGETRULE requests. It continues
to read rules one-by-one using DIOCGETRULE until ENOENT is seen
which indicates all rules got retrieved.

diff below is complete change. I'm going to send partial
diffs for review asking for OKs.

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-cvs=166773945126422=2

diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index 8cbd9d77b4f..3787e97a6b1 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -837,7 +837,7 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
 char *anchorname, int depth, int wildcard, long shownr)
 {
struct pfioc_rule pr;
-   u_int32_t nr, mnr, header = 0;
+   u_int32_t header = 0;
int len = strlen(path), ret = 0;
char *npath, *p;
 
@@ -893,24 +893,9 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
goto error;
}
 
-   if (shownr < 0) {
-   mnr = pr.nr;
-   nr = 0;
-   } else if (shownr < pr.nr) {
-   nr = shownr;
-   mnr = shownr + 1;
-   } else {
-   warnx("rule %ld not found", shownr);
-   ret = -1;
-   goto error;
-   }
-   for (; nr < mnr; ++nr) {
-   pr.nr = nr;
-   if (ioctl(dev, DIOCGETRULE, ) == -1) {
-   warn("DIOCGETRULE");
-   ret = -1;
-   goto error;
-   }
+   while (ioctl(dev, DIOCGETRULE, ) != -1) {
+   if ((shownr != -1) && (shownr != pr.nr))
+   continue;
 
/* anchor is the same for all rules in it */
if (pr.rule.anchor_wildcard == 0)
@@ -968,6 +953,13 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
case PFCTL_SHOW_NOTHING:
break;
}
+   errno = 0;
+   }
+
+   if ((errno != 0) && (errno != ENOENT)) {
+   warn("DIOCGETRULE");
+   ret = -1;
+   goto error;
}
 
/*
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 1141069dcf6..50286dbcb96 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -117,6 +117,15 @@ voidpf_qid_unref(u_int16_t);
 int pf_states_clr(struct pfioc_state_kill *);
 int 

Re: pfctl + bgpd for loop ugliness

2023-04-21 Thread Alexandr Nedvedicky
Hello,

On Tue, Apr 18, 2023 at 02:43:26PM +0200, Theo Buehler wrote:
> On Tue, Apr 18, 2023 at 02:06:46PM +0200, Claudio Jeker wrote:
> > This and the others are IIRC streight from pfctl. So if someone wants a
> > free commit :) 
> 
> How about this. pfctl and bgpd are the same, except that the bgpd one
> has a bsearch() nitems on top.  pfctl regress is happy.
> 
> Index: sbin/pfctl/pfctl_parser.c
> ===
> RCS file: /cvs/src/sbin/pfctl/pfctl_parser.c,v
> retrieving revision 1.347
> diff -u -p -r1.347 pfctl_parser.c
> --- sbin/pfctl/pfctl_parser.c 9 Nov 2022 23:00:00 -   1.347
> +++ sbin/pfctl/pfctl_parser.c 18 Apr 2023 12:37:19 -
> @@ -62,6 +62,10 @@
>  #include "pfctl_parser.h"
>  #include "pfctl.h"
>  
> +#ifndef nitems
> +#define nitems(_a)   (sizeof((_a)) / sizeof((_a)[0]))
> +#endif
> +

I like it.

OK sashan



Re: divert packet checksum

2023-04-04 Thread Alexandr Nedvedicky
Hello,

On Mon, Apr 03, 2023 at 11:39:54PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> When sending IP packets to userland with divert-packet rules, the
> checksum may be wrong.  Locally generated packets diverted by pf
> out rules may have no checksum due to to hardware offloading.
> IDS/IPS systems may complain about that.
> 
> Calculate the checksum in that case.

makes sense to me.
> 
> ok?

OK sashan
> 
> bluhm



Re: pf(4) drops valid IGMP/MLD messages

2023-03-03 Thread Alexandr Nedvedicky
Hello,
On Fri, Mar 03, 2023 at 09:14:38PM +0100, Alexander Bluhm wrote:
> On Fri, Mar 03, 2023 at 08:54:51AM +0100, Luca Di Gregorio wrote:
> > Hi, just another bit of info about this issue.
> 
> Instead of implementing more and more details of RFC, we should
> discuss what the goal is.
> 
> We had a pf that dropped valid IGMP packets due to router-alert
> option.  So I added code to pass them, but still block other options.
> This is neccessary for working multicast.
> 
> Then sashan@ commited a diff that blocks packets with bad TTL or
> destination address.  This was too strict for some use cases as we
> see now.  But it may improve security.
> 
> What are the IGMP illegal packets that an attacker might use?  We
> should drop them.  This IGMP logic is deep down in pf that a user
> cannot override.  So we should be careful not to destroy valid use
> cases.  Replacing || with && somehow in the if condition looks
> reasonalbe to me.
> 

My commit was wrong. It makes pf(4) to drop all IGMP packets which 
either have TTL other than 1
or are not sent to multicast address

this is what we currently have in pf_walk_header() 

6849 /* IGMP packets have router alert options, allow them */
6850 if (pd->proto == IPPROTO_IGMP) {
6851 /* According to RFC 1112 ttl must be set to 1. */
6852 if ((h->ip_ttl != 1) || !IN_MULTICAST(h->ip_dst.s_addr)) {
6853 DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
6854 REASON_SET(reason, PFRES_IPOPTIONS);
6855 return (PF_DROP);
6856 }
6857 CLR(pd->badopts, PF_OPT_ROUTER_ALERT);
6858 }

I think Luca is right pf(4) is too paranoid breaking IGMP. The RFC 1112.
This is what 'Informal Protocol Description' section says:

   Multicast routers send Host Membership Query messages (hereinafter
   called Queries) to discover which host groups have members on their
   attached local networks.  Queries are addressed to the all-hosts
   group (address 224.0.0.1), and carry an IP time-to-live of 1.

In my commit I confused any multicast address with 'all-hosts' group
(224.0.0.1)

So we either want to revert that commit or fix it. I think the condition at
line 6850 should read as follows:

6847 /* IGMP packets have router alert options, allow them */
6848 if (pd->proto == IPPROTO_IGMP) {
6849 /*
6850  * According to RFC 1112 ttl must be set to 1 in all IGMP
6851  * packets sent do 224.0.0.1
6852  */
6853 if ((h->ip_ttl != 1) &&
6854 (h->ip_dst.s_addr == INADDR_ALLHOSTS_GROUP)) {
6855 DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
6856 REASON_SET(reason, PFRES_IPOPTIONS);
6857 return (PF_DROP);
6858 }
6859 CLR(pd->badopts, PF_OPT_ROUTER_ALERT);

This change should make pf(4) reasonably paranoid while keeping  IGMP working.

thanks and
regards
sashan



Re: pf(4) drops valid IGMP/MLD messages

2023-02-28 Thread Alexandr Nedvedicky
Hello Matthieu,



On Tue, Feb 28, 2023 at 02:05:50PM +0100, Matthieu Herrb wrote:
> > --- a/sys/net/pf.c
> > +++ b/sys/net/pf.c
> > @@ -6846,8 +6846,12 @@ pf_walk_header(struct pf_pdesc *pd, struct ip *h, 
> > u_short *reason)
> > pd->proto = h->ip_p;
> > /* IGMP packets have router alert options, allow them */
> > if (pd->proto == IPPROTO_IGMP) {
> > -   /* According to RFC 1112 ttl must be set to 1. */
> > -   if ((h->ip_ttl != 1) || !IN_MULTICAST(h->ip_dst.s_addr)) {
> > +   /*
> > +* According to RFC 1112 ttl must be set to 1 in all IGMP
> > +* packets sent do 224.0.0.1
> > +*/
> > +   if ((h->ip_ttl != 1) &&
> > +   (h->ip_dst.s_addr == INADDR_ALLHOSTS_GROUP)) {
> > DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
> > REASON_SET(reason, PFRES_IPOPTIONS);
> > return (PF_DROP);
> > 
> 
> 
> Hi,
> 
> The expression look wrong to me again.
> I read in RFC1112 that correct packets should have:
> 
>   (h->ip_ttl == 1 && h->ip.ip_dst.s_addr == INADDR_ALLHOST_GROUP)

the expression above is true for for Query/Report messages
which are a sub-set of IGMP protocol. See Luca's emails with
references to RFCs which further extend IGMP protocol.

here in this particular place (pf_walk_header()) I think we really
want to simply discard all IGMP datagrams sent to 224.0.0.1 with TTL
different to 1.  anything else should be passed further up and let pf(4)
rules to make a decision on those cases.

thanks and
regards
sashan



Re: pf(4) drops valid IGMP/MLD messages

2023-02-26 Thread Alexandr Nedvedicky
Hello,


> > 8<---8<---8<--8<
> > diff --git a/sys/net/pf.c b/sys/net/pf.c
> > index 8cb1326a160..c328109026c 100644
> > --- a/sys/net/pf.c
> > +++ b/sys/net/pf.c
> > @@ -6847,7 +6847,7 @@ pf_walk_header(struct pf_pdesc *pd, struct ip *h, 
> > u_short *reason)
> > /* IGMP packets have router alert options, allow them */
> > if (pd->proto == IPPROTO_IGMP) {
> > /* According to RFC 1112 ttl must be set to 1. */
> > -   if ((h->ip_ttl != 1) || !IN_MULTICAST(h->ip_dst.s_addr)) {
> > +   if ((h->ip_ttl != 1) && IN_MULTICAST(h->ip_dst.s_addr)) {
> > DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
> > REASON_SET(reason, PFRES_IPOPTIONS);
> > return (PF_DROP);
> > @@ -7101,8 +7101,8 @@ pf_walk_header6(struct pf_pdesc *pd, struct ip6_hdr 
> > *h, u_short *reason)
> >  * missing then MLD message is invalid and
> >  * should be discarded.
> >  */
> > -   if ((h->ip6_hlim != 1) ||
> > -   !IN6_IS_ADDR_LINKLOCAL(>ip6_src)) {
> > +   if ((h->ip6_hlim != 1) &&
> > +   IN6_IS_ADDR_LINKLOCAL(>ip6_src)) {
> > DPFPRINTF(LOG_NOTICE, "Invalid MLD");
> > REASON_SET(reason, PFRES_IPOPTIONS);
> > return (PF_DROP);
> > 
> 
> Unless I'm missing more context, this hunk looks wrong to me. Valid
> MLD packets must have a ttl of 1 *and* come from a LL address. The
> initial logic seems ok to me.
> 

yes you are right. Your comment made me to take better look
at RFC 1112 [1]. Section 'Informal Protocol Description'
reads as follows:

   Multicast routers send Host Membership Query messages (hereinafter
   called Queries) to discover which host groups have members on their
   attached local networks.  Queries are addressed to the all-hosts
   group (address 224.0.0.1), and carry an IP time-to-live of 1.

I think I've confused all-hosts group (224.0.0.1) with any multicast
address (any class-D 224.0.0.0). I think the diff below is what we
actually need  to get IPv4 IGMP working again:

[1] https://www.ietf.org/rfc/rfc1112.txt

8<---8<---8<--8<
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 8cb1326a160..c50173186da 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -6846,8 +6846,12 @@ pf_walk_header(struct pf_pdesc *pd, struct ip *h, 
u_short *reason)
pd->proto = h->ip_p;
/* IGMP packets have router alert options, allow them */
if (pd->proto == IPPROTO_IGMP) {
-   /* According to RFC 1112 ttl must be set to 1. */
-   if ((h->ip_ttl != 1) || !IN_MULTICAST(h->ip_dst.s_addr)) {
+   /*
+* According to RFC 1112 ttl must be set to 1 in all IGMP
+* packets sent do 224.0.0.1
+*/
+   if ((h->ip_ttl != 1) &&
+   (h->ip_dst.s_addr == INADDR_ALLHOSTS_GROUP)) {
DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
REASON_SET(reason, PFRES_IPOPTIONS);
return (PF_DROP);



pf(4) drops valid IGMP/MLD messages

2023-02-25 Thread Alexandr Nedvedicky
Hello,

the issue has been discovered and analyzed by Luca di Gregorio
on bugs@ [1]. The thread [1] covers all details. People who
wish to read brief summary can skip to Luca's email [2].

To wrap it up the current handling IGMP/MLD messages in pf(4)
is exact opposite. I failed to translate English words from
RFC standards into correct C code. Patch below are two one-liners
which make multicast for IPv4/IPv6 to work again with pf(4) enabled.

Let me quote Luca's summary for IPv6  here:

If the IP Destination Address is multicast, then the TTL
should be 1.

If the IP Destination Address is not multicast, then
no restrictions on TTL.

and Luca's summary for IPv4:

If the IP Destination Address is 224.0.0.0/4, then the TTL
should be 1.

If the IP Destination Address is not 224.0.0.0/4, then no
restrictions on TTL.

As I've said all details and references to RFCs can be found
in Luca's emails on bugs@ [1].

Diff below to fix IGMP/MLD handling has been essentially proposed
by Luca [3]. OK to commit?

thanks and
regards
sashan

[1] https://marc.info/?t=16771405962=1=2

[2] https://marc.info/?l=openbsd-bugs=167727244015783=2

[3] https://marc.info/?l=openbsd-bugs=167722460220004=2

8<---8<---8<--8<
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 8cb1326a160..c328109026c 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -6847,7 +6847,7 @@ pf_walk_header(struct pf_pdesc *pd, struct ip *h, u_short 
*reason)
/* IGMP packets have router alert options, allow them */
if (pd->proto == IPPROTO_IGMP) {
/* According to RFC 1112 ttl must be set to 1. */
-   if ((h->ip_ttl != 1) || !IN_MULTICAST(h->ip_dst.s_addr)) {
+   if ((h->ip_ttl != 1) && IN_MULTICAST(h->ip_dst.s_addr)) {
DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
REASON_SET(reason, PFRES_IPOPTIONS);
return (PF_DROP);
@@ -7101,8 +7101,8 @@ pf_walk_header6(struct pf_pdesc *pd, struct ip6_hdr *h, 
u_short *reason)
 * missing then MLD message is invalid and
 * should be discarded.
 */
-   if ((h->ip6_hlim != 1) ||
-   !IN6_IS_ADDR_LINKLOCAL(>ip6_src)) {
+   if ((h->ip6_hlim != 1) &&
+   IN6_IS_ADDR_LINKLOCAL(>ip6_src)) {
DPFPRINTF(LOG_NOTICE, "Invalid MLD");
REASON_SET(reason, PFRES_IPOPTIONS);
return (PF_DROP);



Re: assert fail in pfsync_grab_snapshot()

2023-02-21 Thread Alexandr Nedvedicky
Hello Lyndon,

this assert has been removed in current (revision 1.310). The complete diff
reads as follows:

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index d279ede9cd6..64a2da195ab 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -1,4 +1,4 @@
-/* $OpenBSD: if_pfsync.c,v 1.309 2022/11/06 21:34:01 kn Exp $  */
+/* $OpenBSD: if_pfsync.c,v 1.310 2022/11/11 11:22:48 sashan Exp $  */
 
 /*
  * Copyright (c) 2002 Michael Shalayeff
@@ -1362,10 +1362,17 @@ pfsync_grab_snapshot(struct pfsync_snapshot *sn, struct 
pfsync_softc *sc)
TAILQ_INIT(>sn_qs[q]);
 
while ((st = TAILQ_FIRST(>sc_qs[q])) != NULL) {
-   KASSERT(st->snapped == 0);
TAILQ_REMOVE(>sc_qs[q], st, sync_list);
-   TAILQ_INSERT_TAIL(>sn_qs[q], st, sync_snap);
-   st->snapped = 1;
+   if (st->snapped == 0) {
+   TAILQ_INSERT_TAIL(>sn_qs[q], st, sync_snap);
+   st->snapped = 1;
+   } else {
+   /*
+* item is on snapshot list already, so we can
+* skip it now.
+*/
+   pf_state_unref(st);
+   }
}
}
 
8<---8<---8<--8<

commit changes the assert to regular condition. Not sure if diff applies
cleanly to 7.2

regards
sashan

On Tue, Feb 21, 2023 at 11:58:51AM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) 
wrote:
> Perhaps related to the recent discussion about pf errors?  This
> happened this morning, on a fully patched 7.2 amd64. dmesg and other
> info available on request.
> 
> ddb{0}> bt
> db_enter() at db_enter+0x10
> panic(81f30a19) at panic+0xbf
> __assert(81fa0761,81fd3154,637,81f6b8f7) at 
> __assert+0x
> 25
> pfsync_grab_snapshot(8000211e0ee0,80b57000) at 
> pfsync_grab_snapshot
> +0x308
> pfsync_sendout() at pfsync_sendout+0x89
> pfsync_update_state(fd905d068020) at pfsync_update_state+0x15b
> pf_test(2,3,80b65000,8000211e1258) at pf_test+0x117a
> ip_output(fd806e7ce500,0,8000211e13e8,1,0,0,e42ab41b1c818e9c) at 
> ip_out
> put+0x6b7
> ip_forward(fd806e7ce500,80b53800,fd8d52297390,0) at 
> ip_forward+
> 0x2da
> ip_input_if(8000211e1528,8000211e1534,4,0,80b53800) at 
> ip_input
> _if+0x35c
> ipv4_input(80b53800,fd806e7ce500) at ipv4_input+0x39
> ether_input(80b53800,fd806e7ce500) at ether_input+0x3b1
> carp_input(80b67800,fd806e7ce500,5e000106) at carp_input+0x196
> ether_input(80b67800,fd806e7ce500) at ether_input+0x1d9
> vlan_input(80b39000,fd806e7ce500,8000211e175c) at 
> vlan_input+0x
> 23d
> ether_input(80b39000,fd806e7ce500) at ether_input+0x85
> if_input_process(802c5048,8000211e17f8) at if_input_process+0x6f
> ifiq_process(802c8600) at ifiq_process+0x69
> taskq_thread(80033200) at taskq_thread+0x100
> end trace frame: 0x0, count: -19
> ddb{0}> show panic
> *cpu0: kernel diagnostic assertion "st->snapped == 0" failed: file 
> "/usr/src/sy
> s/net/if_pfsync.c", line 1591
>  cpu3: kernel diagnostic assertion "st->snapped == 0" failed: file 
> "/usr/src/sy
> s/net/if_pfsync.c", line 1591
>  cpu1: kernel diagnostic assertion "st->snapped == 0" failed: file 
> "/usr/src/sy
> s/net/if_pfsync.c", line 1591
> ddb{0}> 
> 
> --lyndon
> 



Re: fix NULL pointer dereference in pfsync_bulk_update()

2023-02-14 Thread Alexandr Nedvedicky
Hello,

On Tue, Feb 14, 2023 at 01:23:16AM +0100, Alexander Bluhm wrote:
> On Mon, Feb 13, 2023 at 08:39:39AM +0100, Alexandr Nedvedicky wrote:
> > this bug has been found and reported by Hrvoje@ [1].
> > I took my chance and asked Hrvoje to test a small diff [2].
> > I would like to ask for OK to commit this fix which makes
> > Hrvoje's test box happy. Diff below is same to one found
> > at bugs@. The thing is that pfsync_bulk_update() function
> > must check first if there is anything to update, otherwise
> > we may day due to NULL pointer dereference.
> 
> Your check makes sense, OK bluhm@
> 
> But who is protecting write access to sc->sc_bulk_next ?
> 
> I think it is exclusive netlock.  This works as pfsync_input() is
> a protocol input function which is still not running in parallel.

yes, NET_LOCK() is still exclusive lock. so it should provide
sufficient protection.

> 
> rw_enter_read(_state_list.pfs_rwl) does not protect sc->sc_bulk_next
> it is a read lock.  mtx_enter(_state_list.pfs_mtx) is not taken
> for all writers to sc->sc_bulk_next.
> 
> Do you have plans to relax this code from exclusive to shared
> netlock?

dlg@ has a new code for pfsync. I've seen his code back in ?December?
last year. support for bulk transfers was missing piece.

I think option we have here in pfsync(4) is to introduce a new rw-lock
private to if_pfsync and remove NET_LOCK() here completely.

sashan



fix NULL pointer dereference in pfsync_bulk_update()

2023-02-12 Thread Alexandr Nedvedicky
Hello,

this bug has been found and reported by Hrvoje@ [1].
I took my chance and asked Hrvoje to test a small diff [2].
I would like to ask for OK to commit this fix which makes
Hrvoje's test box happy. Diff below is same to one found
at bugs@. The thing is that pfsync_bulk_update() function
must check first if there is anything to update, otherwise
we may day due to NULL pointer dereference.

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-bugs=167578573111413=2

[2] https://marc.info/?l=openbsd-bugs=167584283809140=2

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index e2c86971336..1fa58f6fab9 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -2464,6 +2464,11 @@ pfsync_bulk_update(void *arg)
st = sc->sc_bulk_next;
sc->sc_bulk_next = NULL;
 
+   if (st == NULL) {
+   rw_exit_read(_state_list.pfs_rwl);
+   goto out;
+   }
+
for (;;) {
if (st->sync_state == PFSYNC_S_NONE &&
st->timeout < PFTM_MAX &&



Re: bpf(4) "wait" timeouts

2023-02-12 Thread Alexandr Nedvedicky
Hello,

This diff looks good to me. Though I still have some
doubts about accuracy of comment here:


>   return (kn->kn_data > 0);
> @@ -1510,6 +1599,15 @@ bpf_catchpacket(struct bpf_d *d, u_char 
>   ++d->bd_dcount;
>   return;
>   }
> +
> + /*
> +  * there's a small chance bpf_wait_cb is running
> +  * concurrently with this and two wakeups will be
> +  * generated.
> +  */
> + if (timeout_del(>bd_wait_tmo))
> + bpf_put(d);
> +
>   ROTATE_BUFFERS(d);
>   do_wakeup = 1;
>   curlen = 0;
> @@ -1530,12 +1628,27 @@ bpf_catchpacket(struct bpf_d *d, u_char 

I'm still failing to spot the race the comment is talking
about. code in bpf_wait_cb() grabs `d->bd_mtx` mutex, the
same mutex which is held by bpf_catchpacket() caller.

However my doubts about comment should not prevent you 
committing code which solves the issue and looks good.

OK sashan



Re: pf max-src-{states,conn} without overload/flush useless?

2023-02-09 Thread Alexandr Nedvedicky
Hello,

On Wed, Feb 08, 2023 at 09:42:11PM -0600, joshua stein wrote:

> $ for i in `seq 5` ; do nc 192.168.1.240 22 &  done
> [2] 68892
> [3] 6303
> [4] 63554
> [5] 87833
> [6] 49997
> $ SSH-2.0-OpenSSH_9.1
> SSH-2.0-OpenSSH_9.1
> SSH-2.0-OpenSSH_9.1
> SSH-2.0-OpenSSH_9.1
> SSH-2.0-OpenSSH_9.1
> 
> vm:~$ doas pfctl -sr
> block return all
> pass out all flags S/SA
> pass in on egress inet6 proto tcp from any to ::1 port = 22 flags S/SA keep 
> state (source-track rule, max-src-conn 3)
> pass in on egress inet proto tcp from any to 127.0.0.1 port = 22 flags S/SA 
> keep state (source-track rule, max-src-conn 3)
> pass in on egress inet proto tcp from any to 192.168.1.240 port = 22 flags 
> S/SA keep state (source-track rule, max-src-conn 3)
> 
> This is with:
> 
> OpenBSD 7.2-current (GENERIC.MP) #2014: Tue Feb  7 16:24:04 MST 2023
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP


I gave it a try after doing a sysupgrade to:

penBSD 7.2-current (GENERIC.MP) #1025: Wed Feb  8 19:16:09 MST 2023

it still works for me as expected:
disk$ for i in `seq 5` ; do nc 192.168.2.175 22 & done
[1] 51566
[2] 78983
[3] 77864
[4] 37474
[5] 98599
disk$ SSH-2.0-OpenSSH_9.2
SSH-2.0-OpenSSH_9.2
SSH-2.0-OpenSSH_9.2

my connection arrives over iwn0 interface which is in egress group
so our environments are almost identical.



> 
> > > diff --git sys/net/pf.c sys/net/pf.c
> > > index 8cb1326a160..89703feab12 100644
> > > --- sys/net/pf.c
> > > +++ sys/net/pf.c
> > > @@ -481,12 +481,10 @@ pf_src_connlimit(struct pf_state **stp)
> > >   if ((sn = pf_get_src_node((*stp), PF_SN_NONE)) == NULL)
> > >   return (0);
> > >  
> > > - sn->conn++;
> > > - (*stp)->src.tcp_est = 1;
> > >   pf_add_threshold(>conn_rate);
> > >  
> > >   if ((*stp)->rule.ptr->max_src_conn &&
> > > - (*stp)->rule.ptr->max_src_conn < sn->conn) {
> > > + sn->conn >= (*stp)->rule.ptr->max_src_conn) {
> > >   pf_status.lcounters[LCNT_SRCCONN]++;
> > >   bad++;
> > >   }
> > > @@ -497,8 +495,11 @@ pf_src_connlimit(struct pf_state **stp)
> > >   bad++;
> > >   }
> > >  
> > > - if (!bad)
> > > + if (!bad) {
> > > + sn->conn++;
> > > + (*stp)->src.tcp_est = 1;
> > >   return (0);
> > > + }
> > >  
> > >   if ((*stp)->rule.ptr->overload_tbl) {
> > >   struct pfr_addr p;
> > 
> > it seems to me the change to pf_src_connlimit() does
> > not alter behavior. I think change to pf_src_connlimit()
> > can be dropped.
> 
> But don't we not want to increment the source node's connection 
> count since we're not going to accept the connection (in the !bad 
> case)?  I'm not sure what kind of bookkeeping that would screw up.
> 

what we currently do is we always bump connection count
for source node we found. then we are going to check limit
(*stp)->rule.ptr->max_src_conn < sn->conn
if the limit is exceeded we mark as closed and expired (timeout
PFTM_PURGE). We also report that to caller which should close the
connection.

your change stops counting connections as soon as limit is
reached. So now I see there is a change in behavior. I've missed
that yesterday. I'm not able to tell if we want go that way or not.


> > currently we do conn limit check in step (4). Your change moves this
> > to earlier step (3) (given I understand things right here).
> > It's awfully early here I need sleep on this.
> 
> Yes, that was my understanding too.  We wait until the remote has 
> done enough work to be a valid connection but then block it before 
> sending the final ack.
> 
> > can you give it a try with your slightly modified diff? just drop
> > changes to pf_src_connlimit() and keep those in pf_tcp_track_full() which
> > I believe is the only relevant part.
> 
> Yes, it still works, and only allows me 3 connections with the final 
> 2 timing out as expected:
> 
> $ for i in `seq 5` ; do nc 192.168.1.240 22 &  done
> [2] 10193
> [3] 30197
> [4] 72235
> [5] 69900
> [6] 99044
> $ SSH-2.0-OpenSSH_9.1
> SSH-2.0-OpenSSH_9.1
> SSH-2.0-OpenSSH_9.1
> [5]  - exit 1 nc 192.168.1.240 22
> $
> [6]  + exit 1 nc 192.168.1.240 22
> 

the only explanation why it does not work for you is latency. The packets which
match state run as a readers.  so if we call pf_set_protostate() before we
actually check the limit then it might be the cause here. I admit it's
rather a speculation.

can you give a try to diff below?

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 8cb1326a160..f81b0c793ce 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -4919,14 +4919,14 @@ pf_tcp_track_full(struct pf_pdesc *pd, struct pf_state 
**stp, u_short *reason,
pf_set_protostate(*stp, psrc, TCPS_CLOSING);
if (th->th_flags & TH_ACK) {
if (dst->state == 

Re: pf max-src-{states,conn} without overload/flush useless?

2023-02-08 Thread Alexandr Nedvedicky
Hello,

I did test similar rules on my system
OpenBSD 7.2-current (GENERIC.MP) #978: Sun Jan 22 11:41:04 MST 2023

these are my rules:

set skip on lo

block return# block stateless traffic
pass out log# establish keep-state
pass in on iwn0 proto tcp from 192.168.2.0/24 to self port 22 \
keep state (source-track rule, max-src-conn 3)

this is my test terminal on remote host:
router$ for i in `seq 5` ; do nc 192.168.2.175 22 &  done 
[1] 32472
[2] 97453
[3] 7192
[4] 50386
[5] 57517
router$ SSH-2.0-OpenSSH_9.1
SSH-2.0-OpenSSH_9.1
SSH-2.0-OpenSSH_9.1

there are three SSH version strings which indicates
I got 3 working connection of 5 attempts.

I'm not sure why it seems to work for me (ssh) while
it is broken for you (http). I'll give it a try with
webserver.

On Wed, Feb 08, 2023 at 04:34:55PM -0600, joshua stein wrote:



> diff --git sys/net/pf.c sys/net/pf.c
> index 8cb1326a160..89703feab12 100644
> --- sys/net/pf.c
> +++ sys/net/pf.c
> @@ -481,12 +481,10 @@ pf_src_connlimit(struct pf_state **stp)
>   if ((sn = pf_get_src_node((*stp), PF_SN_NONE)) == NULL)
>   return (0);
>  
> - sn->conn++;
> - (*stp)->src.tcp_est = 1;
>   pf_add_threshold(>conn_rate);
>  
>   if ((*stp)->rule.ptr->max_src_conn &&
> - (*stp)->rule.ptr->max_src_conn < sn->conn) {
> + sn->conn >= (*stp)->rule.ptr->max_src_conn) {
>   pf_status.lcounters[LCNT_SRCCONN]++;
>   bad++;
>   }
> @@ -497,8 +495,11 @@ pf_src_connlimit(struct pf_state **stp)
>   bad++;
>   }
>  
> - if (!bad)
> + if (!bad) {
> + sn->conn++;
> + (*stp)->src.tcp_est = 1;
>   return (0);
> + }
>  
>   if ((*stp)->rule.ptr->overload_tbl) {
>   struct pfr_addr p;

it seems to me the change to pf_src_connlimit() does
not alter behavior. I think change to pf_src_connlimit()
can be dropped.

> @@ -4919,14 +4920,14 @@ pf_tcp_track_full(struct pf_pdesc *pd, struct 
> pf_state **stp, u_short *reason,
>   pf_set_protostate(*stp, psrc, TCPS_CLOSING);
>   if (th->th_flags & TH_ACK) {
>   if (dst->state == TCPS_SYN_SENT) {
> - pf_set_protostate(*stp, pdst,
> - TCPS_ESTABLISHED);
> - if (src->state == TCPS_ESTABLISHED &&
> + if (src->state == TCPS_SYN_SENT &&
>   !SLIST_EMPTY(&(*stp)->src_nodes) &&
>   pf_src_connlimit(stp)) {
>   REASON_SET(reason, PFRES_SRCLIMIT);
>   return (PF_DROP);
>   }
> + pf_set_protostate(*stp, pdst,
> + TCPS_ESTABLISHED);
>   } else if (dst->state == TCPS_CLOSING)
>   pf_set_protostate(*stp, pdst,
>   TCPS_FIN_WAIT_2);
> 

If I understand things right then in current pf we check conn limit
when we see acknowledgment of 3rd client's ACK sent by server. Not sure
if I sound clear here so let me draw little diagram:

SYN > sent by client moves state to

TCPS_SYN_SENT : TCPS_LISTEN/TCPS_CLOSED (1)

SYN | ACK < sent by server moves state to

TCPS_SYN_SENT : TCPS_SYN_SENT (2)

ACK > 3rd ack sent by client move state to:

TCPS_ESTABLISHED : TCPS_SYN_SENT (3)

ACK <  server acknowledges client's 3rd ack moves state to

TCPS_ESTABLISHED : TCPS_ESTABLISHED (4)

currently we do conn limit check in step (4). Your change moves this
to earlier step (3) (given I understand things right here).
It's awfully early here I need sleep on this.

can you give it a try with your slightly modified diff? just drop
changes to pf_src_connlimit() and keep those in pf_tcp_track_full() which
I believe is the only relevant part.

thanks and
regards
sashan



Re: pf.conf bug

2023-02-06 Thread Alexandr Nedvedicky
Hello,
[ cc'ing also tech@ ]

On Mon, Feb 06, 2023 at 06:44:38PM +0300, r...@bh0.amt.ru wrote:
> >Synopsis:pf.conf bug
> >Category:system
> >Environment:
>   System  : OpenBSD 7.2
>   Details : OpenBSD 7.2 (GENERIC.MP) #6: Sat Jan 21 01:03:04 MST 2023
>
> r...@syspatch-72-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   The pfctl utility incorrectly configures the PF device from
>   a pf.conf file. The output below shows that instead of loading
>   the rule
>   "block drop in quick inet6 proto ipv6-icmp all icmp6-type 255"
>   the pfctl loads the rule
>   "block drop in quick inet6 proto ipv6-icmp all".
> >How-To-Repeat:
>   bgps# cat ./pf.conf.test
>   block in quick inet6 proto icmp6 all icmp6-type {0, 127, 255}
>   pass quick inet6 proto icmp6 all no state
>   pass all
>   bgps# pfctl -f ./pf.conf.test
>   bgps# pfctl -s rules
>   block drop in quick inet6 proto ipv6-icmp all icmp6-type 0
>   block drop in quick inet6 proto ipv6-icmp all icmp6-type 127
>   block drop in quick inet6 proto ipv6-icmp all
>   pass quick inet6 proto ipv6-icmp all no state
>   pass all flags S/SA

patch below fixes the issue. We need to change a type for
icmp*type and icmp*code from u_int8_t to u_int16_t to make
code correct. the thing is that legit values are <0, 256>
at the moment. The '0' has a special meaning matching 'any'
icmp type/code. Snippet below comes from parse.y which shows
how logic works:

3198 icmptype: STRING{
3199 const struct icmptypeent*p;
3200
3201 if ((p = geticmptypebyname($1, AF_INET)) == NULL) {
3202 yyerror("unknown icmp-type %s", $1);
3203 free($1);
3204 YYERROR;
3205 }
3206 $$ = p->type + 1;
3207 free($1);
3208 }
3209 | NUMBER
3210 if ($1 < 0 || $1 > 255) {
3211 yyerror("illegal icmp-type %lld", $1);
3212 YYERROR;
3213 }
3214 $$ = $1 + 1;
3215 }
3216 ;
3217
3218 icmp6type   : STRING{

if we want to allow firewall administrator to specify a match
on icmptype 255 then extending type from uint8_t to uint16_t
is the right change.

another option is to change logic here to allow matching icmptype in
range <0, 254>

either way is fine for me. however my preference is leaning
towards a type change.

note diff also rops pad[2] member to compensate for change
of uint8_t to uin16_t for type, code members. I'm not sure
about this bit hence I'd like to bring it up here.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sbin/pfctl/parse.y b/sbin/pfctl/parse.y
index 2c5a49772ff..898bded8c24 100644
--- a/sbin/pfctl/parse.y
+++ b/sbin/pfctl/parse.y
@@ -134,8 +134,8 @@ struct node_gid {
 };
 
 struct node_icmp {
-   u_int8_t code;
-   u_int8_t type;
+   u_int16_tcode;  /* aux. value 256 is legit */
+   u_int16_ttype;  /* aux. value 256 is legit */
u_int8_t proto;
struct node_icmp*next;
struct node_icmp*tail;
diff --git a/sys/net/pfvar.h b/sys/net/pfvar.h
index 1453bc35c04..1efb1b5221c 100644
--- a/sys/net/pfvar.h
+++ b/sys/net/pfvar.h
@@ -572,8 +572,8 @@ struct pf_rule {
u_int8_t keep_state;
sa_family_t  af;
u_int8_t proto;
-   u_int8_t type;
-   u_int8_t code;
+   u_int16_ttype;  /* aux. value 256 is legit */
+   u_int16_tcode;  /* aux. value 256 is legit */
u_int8_t flags;
u_int8_t flagset;
u_int8_t min_ttl;
@@ -592,7 +592,6 @@ struct pf_rule {
u_int8_t set_prio[2];
sa_family_t  naf;
u_int8_t rcvifnot;
-   u_int8_t pad[2];
 
struct {
struct pf_addr  addr;



patch for 7.2 to fix pfsync panics in pf_state_export()

2023-01-09 Thread Alexandr Nedvedicky
Hello,

if you don't use pfsync on 7.2 stop reading now.

There are two reports already [1] which have the same cause:
NULL pointer dereference in pf_state_export().

dlg@ has fixed this bug in CUURENT back in December [2]. Unfortunately
the patch for 7.2 needs a manual merge, because CURRENT slightly differs.

Below you can find a patch for 7.2. There is a positive feedback from Tamas [3].
I'd like to ask more people who run pfsync on 7.2 to give it a try and
come back with report. Please also keep an eye on memory leaks:

vmstat -m |egrep -e '^Name|^pfst'

the InUse counter should be oscilating around some mean value which depends
on your network load.

thanks for your help
regards
sashan


[1] https://marc.info/?l=openbsd-bugs=167195892512328=2
https://marc.info/?l=openbsd-bugs=167318980023225=2

[2] https://marc.info/?l=openbsd-cvs=167115592605571=2

[3] https://marc.info/?l=openbsd-bugs=167319072223544=2

8<8<---8<-8<

Index: sys/net/pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1140.2.1
diff -u -p -u -r1.1140.2.1 pf.c
--- sys/net/pf.c24 Nov 2022 22:51:23 -  1.1140.2.1
+++ sys/net/pf.c9 Jan 2023 21:55:58 -
@@ -185,6 +185,8 @@ int  pf_translate_icmp_af(struct pf_pd
 voidpf_send_icmp(struct mbuf *, u_int8_t, u_int8_t, int,
sa_family_t, struct pf_rule *, u_int);
 voidpf_detach_state(struct pf_state *);
+struct pf_state_key*pf_state_key_attach(struct pf_state_key *,
+struct pf_state *, int);
 voidpf_state_key_detach(struct pf_state *, int);
 u_int32_t   pf_tcp_iss(struct pf_pdesc *);
 voidpf_rule_to_actions(struct pf_rule *,
@@ -723,17 +725,26 @@ pf_state_compare_id(struct pf_state *a, 
return (0);
 }
 
-int
+/*
+ * on failure, pf_state_key_attach() releases the pf_state_key
+ * reference and returns NULL.
+ */
+struct pf_state_key *
 pf_state_key_attach(struct pf_state_key *sk, struct pf_state *s, int idx)
 {
struct pf_state_item*si;
struct pf_state_key *cur;
struct pf_state *olds = NULL;
 
+   PF_ASSERT_LOCKED();
+
KASSERT(s->key[idx] == NULL);
-   if ((cur = RB_INSERT(pf_state_tree, _statetbl, sk)) != NULL) {
+   sk->removed = 0;
+   cur = RB_INSERT(pf_state_tree, _statetbl, sk);
+   if (cur != NULL) {
+   sk->removed = 1;
/* key exists. check for same kif, if none, add to key */
-   TAILQ_FOREACH(si, >states, entry)
+   TAILQ_FOREACH(si, >states, entry) {
if (si->s->kif == s->kif &&
((si->s->key[PF_SK_WIRE]->af == sk->af &&
 si->s->direction == s->direction) ||
@@ -769,44 +780,53 @@ pf_state_key_attach(struct pf_state_key 
/* remove late or sks can go away */
olds = si->s;
} else {
-   pool_put(_state_key_pl, sk);
-   return (-1);/* collision! */
+   pf_state_key_unref(sk);
+   return (NULL);  /* collision! */
}
}
-   pool_put(_state_key_pl, sk);
-   s->key[idx] = cur;
-   } else
-   s->key[idx] = sk;
+   }
+
+   /* reuse the existing state key */
+   pf_state_key_unref(sk);
+   sk = cur;
+   }
 
if ((si = pool_get(_state_item_pl, PR_NOWAIT)) == NULL) {
-   pf_state_key_detach(s, idx);
-   return (-1);
+   if (TAILQ_EMPTY(>states)) {
+   KASSERT(cur == NULL);
+   RB_REMOVE(pf_state_tree, _statetbl, sk);
+   sk->removed = 1;
+   pf_state_key_unref(sk);
+   }
+
+   return (NULL);
}
-   si->s = s;
+
+   s->key[idx] = pf_state_key_ref(sk); /* give a ref to state */
+   si->s = pf_state_ref(s);
 
/* list is sorted, if-bound states before floating */
if (s->kif == pfi_all)
-   TAILQ_INSERT_TAIL(>key[idx]->states, si, entry);
+   TAILQ_INSERT_TAIL(>states, si, entry);
else
-   TAILQ_INSERT_HEAD(>key[idx]->states, si, entry);
+   TAILQ_INSERT_HEAD(>states, si, entry);
 
if (olds)
pf_remove_state(olds);
 
-   return (0);
+   /* caller owns the pf_state ref, which owns a pf_state_key ref now */
+   return (sk);
 }
 
 void
 pf_detach_state(struct pf_state *s)
 {
-   if (s->key[PF_SK_WIRE] == 

Re: more consistently use "st" as the name for struct pf_state variables

2023-01-05 Thread Alexandr Nedvedicky
On Thu, Jan 05, 2023 at 08:32:44PM +1000, David Gwynne wrote:
> 
> 
> > On 5 Jan 2023, at 18:56, Alexandr Nedvedicky  wrote:
> > 
> > Hello,
> > 
> > On Wed, Jan 04, 2023 at 09:36:38PM +1000, David Gwynne wrote:
> >> and "stp" for pf_state ** variables.
> >> 
> >I agree with established naming conventions.
> > 
> >I'm also fine with keeping some exceptions such as `a` and `b`
> >in pf_state_compare_id(), local variables `tail`, `head`
> >in pf_states_{clr, get}() and pf_purge_expired_states().
> >I'm also fine with leaving static variable `cur` unchanged.
> > 
> > is there any reason we still keep `pf_state **sm` argument
> > in pf_test_rule()? the same in pf_create_state(). Is it intended?
> 
> there were a bunch of other arguments that ended with m. happy to change it
> to **stp though. we can always do another sweep for other types.
> 

I would change those occurrences to stp. my point is that we don't
expect pf_test_rule() to return matching state.

with those tweaks diff is OK

thanks and
regards
sashan



Re: more consistently use "st" as the name for struct pf_state variables

2023-01-05 Thread Alexandr Nedvedicky
Hello,

On Wed, Jan 04, 2023 at 09:36:38PM +1000, David Gwynne wrote:
> and "stp" for pf_state ** variables.
> 
I agree with established naming conventions.

I'm also fine with keeping some exceptions such as `a` and `b`
in pf_state_compare_id(), local variables `tail`, `head`
in pf_states_{clr, get}() and pf_purge_expired_states().
I'm also fine with leaving static variable `cur` unchanged.

is there any reason we still keep `pf_state **sm` argument
in pf_test_rule()? the same in pf_create_state(). Is it intended?

otherwise diff reads OK to me.

thanks and
regards
sashan



Re: move the pf_state_tree_id type around

2023-01-04 Thread Alexandr Nedvedicky
Hello,

I agree with change as-is. Though I have some suggestions
for few finishing touches. see comments inline.

On Wed, Jan 04, 2023 at 01:15:54PM +1000, David Gwynne wrote:
> move the pf_state_tree_id type from pfvar.h to pfvar_priv.h.
> 
> this is like the pf_state_tree change from yesterday. the
> pf_state_tree_id type is private to the kernel, just like the
> pf_state_tree type.
> 
> moving it from RB to RBT saves another 12k in pf.o on amd64.
> 
> i also changed some hand rolled for loops over to RBT_FOREACH_SAFE.
> they look right?

All those look good to me. Suggestion below are just
suggestions which I don't insist on.

> 
> ok?
> 
> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1166
> diff -u -p -r1.1166 pf.c
> --- pf.c  4 Jan 2023 02:00:49 -   1.1166
> +++ pf.c  4 Jan 2023 03:08:04 -

> @@ -1054,7 +1053,7 @@ pf_state_insert(struct pfi_kif *kif, str
>   s->id = htobe64(pf_status.stateid++);
>   s->creatorid = pf_status.hostid;
>   }
> - if (RB_INSERT(pf_state_tree_id, _id, s) != NULL) {
> + if (RBT_INSERT(pf_state_tree_id, _id, s) != NULL) {
>   if (pf_status.debug >= LOG_NOTICE) {
>   log(LOG_NOTICE, "pf: state insert failed: "
>   "id: %016llx creatorid: %08x",
> @@ -1085,7 +1084,7 @@ pf_find_state_byid(struct pf_state_cmp *
>  {
>   pf_status.fcounters[FCNT_STATE_SEARCH]++;
>  
> - return (RB_FIND(pf_state_tree_id, _id, (struct pf_state *)key));
> + return (RBT_FIND(pf_state_tree_id, _id, (struct pf_state *)key));
>  }

would it make sense to rename argument `s` to `st`? Just to lean
towards consistency in pf(4). `st` usually denotes state.


> Index: pf_ioctl.c
> ===
> RCS file: /cvs/src/sys/net/pf_ioctl.c,v
> retrieving revision 1.394
> diff -u -p -r1.394 pf_ioctl.c
> --- pf_ioctl.c4 Jan 2023 02:00:49 -   1.394
> +++ pf_ioctl.c4 Jan 2023 03:08:05 -
> @@ -1796,10 +1796,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a
>   NET_LOCK();
>   PF_LOCK();
>   PF_STATE_ENTER_WRITE();
> - for (s = RB_MIN(pf_state_tree_id, _id); s;
> - s = nexts) {
> - nexts = RB_NEXT(pf_state_tree_id, _id, s);
> -
> + RBT_FOREACH_SAFE(s, pf_state_tree_id, _id, nexts) {
>   if (s->direction == PF_OUT) {
>   sk = s->key[PF_SK_STACK];
>   srcaddr = >addr[1];

again: same change as earlier just rename `s` to `st` when
already there.

> @@ -2828,7 +2825,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a
>   NET_LOCK();
>   PF_LOCK();
>   PF_STATE_ENTER_WRITE();
> - RB_FOREACH(state, pf_state_tree_id, _id)
> + RBT_FOREACH(state, pf_state_tree_id, _id)
>   pf_src_tree_remove_state(state);
>   PF_STATE_EXIT_WRITE();
>   RB_FOREACH(n, pf_src_tree, _src_tracking)
> @@ -2861,7 +2858,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a
>   if (sn->states != 0) {
>   PF_ASSERT_LOCKED();
>   PF_STATE_ENTER_WRITE();
> - RB_FOREACH(s, pf_state_tree_id,
> + RBT_FOREACH(s, pf_state_tree_id,
>  _id)
>   pf_state_rm_src_node(s, sn);
>   PF_STATE_EXIT_WRITE();

in this case branch too, rename `state` to `st`.

> Index: pf_lb.c
> ===
> RCS file: /cvs/src/sys/net/pf_lb.c,v
> retrieving revision 1.72
> diff -u -p -r1.72 pf_lb.c
> --- pf_lb.c   31 Aug 2022 11:29:12 -  1.72
> +++ pf_lb.c   4 Jan 2023 03:08:05 -
> @@ -311,10 +311,8 @@ pf_map_addr_sticky(sa_family_t af, struc
>   }
>   if (sns[type]->states != 0) {
>   /* XXX expensive */
> - RB_FOREACH(s, pf_state_tree_id,
> -_id)
> - pf_state_rm_src_node(s,
> - sns[type]);
> + RBT_FOREACH(s, pf_state_tree_id, _id)
> + pf_state_rm_src_node(s, sns[type]);
>   }

rename `s` to `st`


>   sns[type]->expire = 1;
>   pf_remove_src_node(sns[type]);
> Index: pfvar.h
> ===
> RCS file: /cvs/src/sys/net/pfvar.h,v
> retrieving revision 1.526
> diff -u -p -r1.526 pfvar.h
> --- pfvar.h   4 Jan 2023 02:00:49 -   1.526
> +++ pfvar.h   4 Jan 2023 03:08:05 

Re: move the pf_state_tree type around

2023-01-02 Thread Alexandr Nedvedicky
Hello,

On Tue, Jan 03, 2023 at 03:31:42PM +1000, David Gwynne wrote:
> the pf_state_tree type represents the rb tree that pf_state_key structs
> go into. currently pf_state_key is declared in pfvar_priv.h (because
> it's a kernel private data structure) but pf_state_tree was left in
> pfvar.h. this moves it to pfvar_priv.h, because it is also kernel
> private.
> 
> while here, this moves it from the RB tree macros to RBT which shrinks
> pf.o by about 13k on amd64.

diff reads OK to me. Do I assume right 'tree_id' tree will
be also moved from RB_*() to RBT_() in another commit?

thanks and OK sashan



Re: rename pf_state_key and pf_state_item members

2022-12-19 Thread Alexandr Nedvedicky
Hello,

On Tue, Dec 20, 2022 at 08:10:30AM +1000, David Gwynne wrote:

> > > -  si->s->direction != s->direction))) {
> > > + TAILQ_FOREACH(si, >sk_states, si_entry) {
> > > + struct pf_state *tst = si->si_st;
> > 
> > appreciate consistency in your diff. it uses 'tst = si->si_st;'
> > however going for 'sis' instead of 'tst' would remind us data here
> > come from state item. This is just a nit. Don't feel strongly
> > about it.
> 
> that makes a lot of sense to me. how about 'sist' instead? 's' for state
> feels weird to me after years of 's = splfol()', 'st' seems more
> explicit.

'sist' is perfect.

> 
> > diff reads OK to me.
> 
> how about this one?

Still reds OK and I like it.

OK sashan



Re: rename pf_state_key and pf_state_item members

2022-12-19 Thread Alexandr Nedvedicky
Hello,

I don't object. diff improces things. Just few bike shedding nits
further below.
On Mon, Dec 19, 2022 at 04:48:57PM +1000, David Gwynne wrote:

> 
> i hope i didn't mess up the (sk_)states list traversals.

could not spot anything wrong there.


> 
> ok?
> 
> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1157
> diff -u -p -r1.1157 pf.c
> --- pf.c  16 Dec 2022 02:05:44 -  1.1157
> +++ pf.c  19 Dec 2022 06:39:45 -
> @@ -315,7 +315,7 @@ struct pf_state_tree_id tree_id;
>  struct pf_state_list pf_state_list = 
> PF_STATE_LIST_INITIALIZER(pf_state_list);
>  
>  RB_GENERATE(pf_src_tree, pf_src_node, entry, pf_src_compare);
> -RB_GENERATE(pf_state_tree, pf_state_key, entry, pf_state_compare_key);
> +RB_GENERATE(pf_state_tree, pf_state_key, sk_entry, pf_state_compare_key);
>  RB_GENERATE(pf_state_tree_id, pf_state,
>  entry_id, pf_state_compare_id);

I think pf_state_tree prototype should be moved to net/pfvar_priv.h perhaps
others should be moved as well, because those trees are private to pf(4).
can be done in follow up commit.

>  
> @@ -736,24 +736,25 @@ pf_state_key_attach(struct pf_state_key 
>   PF_ASSERT_LOCKED();
>  
>   KASSERT(s->key[idx] == NULL);
> - sk->removed = 0;
> + sk->sk_removed = 0;
>   cur = RB_INSERT(pf_state_tree, _statetbl, sk);
>   if (cur != NULL) {
> - sk->removed = 1;
> + sk->sk_removed = 1;
>   /* key exists. check for same kif, if none, add to key */
> - TAILQ_FOREACH(si, >states, entry) {
> - if (si->s->kif == s->kif &&
> - ((si->s->key[PF_SK_WIRE]->af == sk->af &&
> -  si->s->direction == s->direction) ||
> - (si->s->key[PF_SK_WIRE]->af !=
> -  si->s->key[PF_SK_STACK]->af &&
> -  sk->af == si->s->key[PF_SK_STACK]->af &&
> -  si->s->direction != s->direction))) {
> + TAILQ_FOREACH(si, >sk_states, si_entry) {
> + struct pf_state *tst = si->si_st;

appreciate consistency in your diff. it uses 'tst = si->si_st;'
however going for 'sis' instead of 'tst' would remind us data here
come from state item. This is just a nit. Don't feel strongly
about it.

diff reads OK to me.

thanks and
regards
sashan



Re: move pf_state_key and pf_state_item structs from pfvar.h to pfvar_priv.h

2022-12-15 Thread Alexandr Nedvedicky
Hello,

On Fri, Dec 16, 2022 at 01:53:42PM +1000, David Gwynne wrote:
> both struct pf_state_key and pf_state_item are kernel private data
> structures, and do not need to be visible to userland.
> 
> i would like to move them to pfvar_priv.h to make it more obvious that
> they are and should remain kernel private data structures, which in turn
> will make it less scary to tweak them in the future.
> 
> this has survivied a make build.
> 
> ok?

no objection, diff reads OK to me.

thanks and
regards
sashan



Re: help pfsync by extending pf_state_key lifetimes on pf_states

2022-12-05 Thread Alexandr Nedvedicky
Hello,

> 
> pf_test_rule uses the skw and sks pointers after pf_state_insert sets
> them via pf_create_state. i would happily change pf_test rule so it
> reads the pf_state_key pointers out of pf_state rather than carry them
> around on the stack like that, but i figured the diff was big enough as
> it was.

I've forgot about pf_test_rule(). perhaps just setting skw/sks
to NULL on failure (with reference gone) should be sufficient
at this point.
> 
> > the recent diff is OK by me.
> 
> thanks. i'll wait until i can be around to look after it before i commit
> it.

thanks and
regards
sashan



Re: help pfsync by extending pf_state_key lifetimes on pf_states

2022-12-05 Thread Alexandr Nedvedicky
Hello,
On Mon, Dec 05, 2022 at 11:46:07AM +1000, David Gwynne wrote:
> 
> yes, you're right. the diff below includes the simple fix to that.
> 
> this demonstrates how subtle the reference counting around the trees
> is though. maybe it would be more obvious to take another refcnt
> before giving the pf_state_key to the tree(s), and then give the
> pf_state_key_attach callers reference to the pf_state struct. at
> the moment we give the callers reference to the tree while holding
> the PF_LOCK, and borrow it to give another one to the state while
> the lock is still held.

Either way is fine in my opinion. 


> @@ -766,44 +777,53 @@ pf_state_key_attach(struct pf_state_key 
>   /* remove late or sks can go away */
>   olds = si->s;
>   } else {
> - pool_put(_state_key_pl, sk);
> - return (-1);/* collision! */
> + pf_state_key_unref(sk);
> + return (NULL);  /* collision! */
>   }
>   }
> - pool_put(_state_key_pl, sk);
> - s->key[idx] = cur;
> - } else
> - s->key[idx] = sk;
> + }
> +
> + /* reuse the existing state key */
> + pf_state_key_unref(sk);
> + sk = cur;
> + }
>  
>   if ((si = pool_get(_state_item_pl, PR_NOWAIT)) == NULL) {
> - pf_state_key_detach(s, idx);
> - return (-1);
> + if (TAILQ_EMPTY(>states)) {
> + KASSERT(cur == NULL);
> + RB_REMOVE(pf_state_tree, _statetbl, sk);
> + sk->removed = 1;
> + pf_state_key_unref(sk);
> + }
> +
> + return (NULL);
>   }

I like your fix above.

I also wonder if pf_state_insert() should be always setting *skwp, *sksp to
NULL. Just to indicate to caller the 'reference ownership got transferred' and
there is nothing left to worry about. This can be done right at the beginning
of pf_state_insert() after we assign to local variables skw/sks. Just a
thought/nit.

the recent diff is OK by me.

sashan



Re: help pfsync by extending pf_state_key lifetimes on pf_states

2022-12-04 Thread Alexandr Nedvedicky
Hello,


On Sat, Dec 03, 2022 at 09:53:45AM +1000, David Gwynne wrote:
> we (mostly sashan@ and me) have a problem where pfsync can be holding a
> reference to a pf_state that pf has decided to purge, and then pfsync
> crashes because it tries to read the pf_state_key parts of the state,
> but they been removed by the purge process.
> 
> the background to this is that pf_state structures don't contain ip
> addresses, protocols, ports, etc. that information is stored in a
> pf_state_key struct, which is used to wire a state into the state table.
> when pfsync wants to export information about a state, particularly the
> addresses on it, it needs the pf_state_key struct to read from.
> 
> the pf purge code current assumes that when a state is removed from the
> state table, the pf_state_key structs that are used to wire it in can
> also be removed and freed. this has obvious bad consequences for pfsync,
> which could still be holding a reference to the pf_state struct with the
> intention of reading from it.
> 
> this diff tweaks the lifetime of pf_state_structs so they at least match
> the lifetime of pf_states.
> 
> just so i'm clear, the main idea is that once pf_state_insert succeeds,
> a pf_state struct will always have pf_state_key structs hanging off it.
> it's only after the pf_state struct itself is being torn down that its
> references to pf_state_keys are released. if you're holding a
> pf_state reference, you can use that to access the pf_state_key
> structs hanging off it.
> 
> this is largely accomplished by taking more advantage of the pf_state
> and pf_state_key refcnts. pf_states get properly accounted references to
> the pf_state_keys, and the state table gets its own reference counts to
> the pf_state_keys too. when a state is removed from the state table, the
> state table reference counts drop. however, it's not until all pf_state
> references drop that pf_state will give up its references to the
> pf_states.
> 
> this is a very new diff, but i've been kicking it around with pfsync a
> bit and the pf_state_export crashes i used to get are gone now.
> 
> if you're going to test the things to look for are if NAT still works,
> and if the pfstate and pfstkey pool numbers still look right.

all above makes sense to me. I'm fine with the approach.
I think I could just one single glitch here in pf_state_key_attach():

> 
> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1156
> diff -u -p -r1.1156 pf.c
> --- pf.c  25 Nov 2022 20:27:53 -  1.1156
> +++ pf.c  2 Dec 2022 23:20:36 -

> @@ -766,44 +777,52 @@ pf_state_key_attach(struct pf_state_key 
>   /* remove late or sks can go away */
>   olds = si->s;
>   } else {
> - pool_put(_state_key_pl, sk);
> - return (-1);/* collision! */
> + pf_state_key_unref(sk);
> + return (NULL);  /* collision! */
>   }
>   }
> - pool_put(_state_key_pl, sk);
> - s->key[idx] = cur;
> - } else
> - s->key[idx] = sk;
> + }
> +
> + /* reuse the existing state key */
> + pf_state_key_unref(sk);
> + sk = cur;
> + }
>  
>   if ((si = pool_get(_state_item_pl, PR_NOWAIT)) == NULL) {
> - pf_state_key_detach(s, idx);
> - return (-1);
> + if (TAILQ_EMPTY(>states)) {
> + RB_REMOVE(pf_state_tree, _statetbl, sk);
> + sk->removed = 1;
> + }
> +
> + pf_state_key_unref(sk);

I think pf_state_key_unref() above needs to happen if and only if
sk != cur.

otherwise the diff looks good to me.

thanks and
regards
sashan



Re: pfsync panic after pf_purge backout

2022-11-28 Thread Alexandr Nedvedicky
Hello,


> 
> 
> Hi,
> 
> here's panic with WITNESS and this diff on tech@
> https://www.mail-archive.com/tech@openbsd.org/msg72582.html
> 
> I will stop now because I'm not sure what I'm doing and which diffs I'm
> testing...
> 
> 
> r620-1# uvm_fault(0x8248ea28, 0x17, 0, 2) -> e
> kernel: page fault trap, code=0
> Stopped at  pfsync_q_del+0x96:  movq%rdx,0x8(%rax)
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *300703  35643  0 0x14000  0x2001K systq
>  237790  10061  0 0x14000 0x42000  softclock
> pfsync_q_del(fd8323dc3900) at pfsync_q_del+0x96
> pfsync_delete_state(fd8323dc3900) at pfsync_delete_state+0x118
> pf_remove_state(fd8323dc3900) at pf_remove_state+0x14e
> pf_purge_expired_states(c3501) at pf_purge_expired_states+0x1b3
> pf_purge(823ae080) at pf_purge+0x28
> taskq_thread(822cbe30) at taskq_thread+0x11a
> end trace frame: 0x0, count: 9
> https://www.openbsd.org/ddb.html describes the minimum info required in
> bug reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{1}>
> 
> 

diff below should avoid panic above (and similar panics in pfsync_q_del().
It also prints some 'error' system message buffer (a.k.a. dmesg)

We panic because we attempt to remove state from psync queue which is
already empty. the pfsync_q_del() must be acting on some stale information
kept in `st` argument (state).

the information we print into dmesg buffer should help us understand
what's going on. At the moment I can not explain how does it come
there is a state which claims its presence on state queue, while the
queue in question is empty.

I'd like to ask you to give a try diff below and repeat your test.
Let it run for some time and collect 'dmesg' output for me after usual
uptime-to-panic elapses during a test run.

thanks a lot
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index f69790ee98d..d4be84a1f57 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -2254,6 +2254,74 @@ pfsync_q_ins(struct pf_state *st, int q)
} while (0);
 }
 
+const char *
+pfsync_qname(int q)
+{
+   switch (q) {
+   case PFSYNC_S_IACK:
+   return ("PFSYNC_S_IACK");
+   
+   case PFSYNC_S_UPD_C:
+   return ("PFSYNC_S_UPD_C");
+
+   case PFSYNC_S_DEL:
+   return ("PFSYNC_S_DEL");
+
+   case PFSYNC_S_INS:
+   return ("PFSYNC_S_INS");
+
+   case PFSYNC_S_UPD:
+   return ("PFSYNC_S_UPD");
+
+   case PFSYNC_S_COUNT:
+   return ("PFSYNC_S_COUNT");
+
+   case PFSYNC_S_DEFER:
+   return ("PFSYNC_S_DEFER");
+
+   case PFSYNC_S_NONE:
+   return ("PFSYNC_S_NONE");
+
+   default:
+   return ("???");
+   }
+}
+
+const char *
+pfsync_timeout_name(unsigned int timeout)
+{
+   const char *timeout_name[] = {
+   "PFTM_TCP_FIRST_PACKET",
+   "PFTM_TCP_OPENING",
+   "PFTM_TCP_ESTABLISHED",
+   "PFTM_TCP_CLOSING",
+   "PFTM_TCP_FIN_WAIT",
+   "PFTM_TCP_CLOSED",
+   "PFTM_UDP_FIRST_PACKET",
+   "PFTM_UDP_SINGLE",
+   "PFTM_UDP_MULTIPLE",
+   "PFTM_ICMP_FIRST_PACKET",
+   "PFTM_ICMP_ERROR_REPLY",
+   "PFTM_OTHER_FIRST_PACKET",
+   "PFTM_OTHER_SINGLE",
+   "PFTM_OTHER_MULTIPLE",
+   "PFTM_FRAG",
+   "PFTM_INTERVAL",
+   "PFTM_ADAPTIVE_START",
+   "PFTM_ADAPTIVE_END",
+   "PFTM_SRC_NODE",
+   "PFTM_TS_DIFF",
+   "PFTM_MAX",
+   "PFTM_PURGE",
+   "PFTM_UNLINKED"
+   };
+
+   if (timeout > PFTM_UNLINKED)
+   return ("???");
+   else
+   return (timeout_name[timeout]);
+}
+
 void
 pfsync_q_del(struct pf_state *st)
 {
@@ -2273,6 +2341,19 @@ pfsync_q_del(struct pf_state *st)
mtx_leave(>sc_st_mtx);
return;
}
+
+   if (TAILQ_EMPTY(>sc_qs[q])) {
+   mtx_leave(>sc_st_mtx);
+   DPFPRINTF(LOG_ERR,
+   "%s: stale queue (%s) in state (id %llu)"
+   "nosync: %s unlinked: %s timeout: %s", __func__,
+   pfsync_qname(q), st->id,
+   (st->state_flags & PFSTATE_NOSYNC) ? "yes" : "no",
+   (st->timeout == PFTM_UNLINKED) ? "yes" : "no",
+   pfsync_timeout_name(st->timeout));
+   return;
+   }
+
atomic_sub_long(>sc_len, pfsync_qs[q].len);
TAILQ_REMOVE(>sc_qs[q], st, sync_list);
if (TAILQ_EMPTY(>sc_qs[q]))



Re: pfsync panic after pf_purge backout

2022-11-26 Thread Alexandr Nedvedicky
Hello,

On Sat, Nov 26, 2022 at 08:33:28PM +0100, Hrvoje Popovski wrote:

> I just need to say that with all pf, pfsync and with pf_purge diffs
> after hackaton + this diff on tech@
> https://www.mail-archive.com/tech@openbsd.org/msg72582.html
> my production firewall seems stable and it wasn't without that diff

this diff still waits for OK. it makes pfsync to use
state mutex to safely dereference keys.

> 
> I'm not sure if we have same diffs but even Josmar Pierri on bugs@
> https://www.mail-archive.com/bugs@openbsd.org/msg18994.html
> who had panics quite regularly with that diff on tech@ seems to have
> stable firewall now.
> 
> 
> 
> r620-1# uvm_fault(0x82374288, 0x17, 0, 2) -> e
> kernel: page fault trap, code=0
> Stopped at  pfsync_q_del+0x96:  movq%rdx,0x8(%rax)
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *192892  19920  0 0x14000  0x2005K softnet
> pfsync_q_del(fd82e8a4ce20) at pfsync_q_del+0x96
> pf_remove_state(fd82e8a4ce20) at pf_remove_state+0x14b
> pfsync_in_del_c(fd8006d843b8,c,79,0) at pfsync_in_del_c+0x6f
> pfsync_input(800022d60ad8,800022d60ae4,f0,2) at pfsync_input+0x33c
> ip_deliver(800022d60ad8,800022d60ae4,f0,2) at ip_deliver+0x113
> ipintr() at ipintr+0x69
> if_netisr(0) at if_netisr+0xea
> taskq_thread(8003) at taskq_thread+0x100
> end trace frame: 0x0, count: 7
> https://www.openbsd.org/ddb.html describes the minimum info required in
> bug reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{5}>
> 

those panics are causing me headaches. this got most-likely uncovered
by diff which adds a mutex. The mutex makes pfsync stable enough
so you can trigger unknown bugs.

thanks and
regards
sashan



Re: pfsync slpassert on boot and panic

2022-11-25 Thread Alexandr Nedvedicky
Hello Hrvoje,


On Fri, Nov 25, 2022 at 09:57:15AM +0100, Hrvoje Popovski wrote:
> Hi,
> 
> I think that this is similar problem as what David Hill send on tech@
> with subject "splassert on boot"
> 
> I've checkout tree few minutes ago and in there should be
> mvs@ "Remove netlock assertion within PF_LOCK()" and
> dlg@ "get rid of NET_LOCK in the pf purge work" diffs.
> 
> on boot I'm getting this splassert
> 
> splassert: pfsync_delete_state: want 2 have 256
> Starting stack trace...
> pfsync_delete_state(fd83a66644d8) at pfsync_delete_state+0x58
> pf_remove_state(fd83a66644d8) at pf_remove_state+0x14b
> pf_purge_expired_states(1fdb,40) at pf_purge_expired_states+0x202
> pf_purge_states(0) at pf_purge_states+0x1c
> taskq_thread(822f69c8) at taskq_thread+0x11a
> end trace frame: 0x0, count: 252
> End of stack trace.

I've sent a diff yesterday to David Hill [1]. It looks like
I've forgot to add cc' to tach@


> splassert: pfsync_delete_state: want 2 have 0
> Starting stack trace...
> pfsync_delete_state(fd83a6676628) at pfsync_delete_state+0x58
> pf_remove_state(fd83a6676628) at pf_remove_state+0x14b
> pf_purge_expired_states(1f9c,40) at pf_purge_expired_states+0x202
> pf_purge_states(0) at pf_purge_states+0x1c
> taskq_thread(822f69c8) at taskq_thread+0x11a
> end trace frame: 0x0, count: 252
> End of stack trace.
> 
> 
> and if i destroy pfsync interface and then sh /etc/netstart box panic
> 
> uvm_fault(0x823d3250, 0x810, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at  pfsync_q_ins+0x1a:  movq0x810(%r13),%rsi
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> * 68977  95532  0 0x14000  0x2003K systqmp
> pfsync_q_ins(fd83a6676628,2) at pfsync_q_ins+0x1a
> pf_remove_state(fd83a6676628) at pf_remove_state+0x14b
> pf_purge_expired_states(1f9c,40) at pf_purge_expired_states+0x202
> pf_purge_states(0) at pf_purge_states+0x1c
> taskq_thread(822f69c8) at taskq_thread+0x11a
> end trace frame: 0x0, count: 10
> https://www.openbsd.org/ddb.html describes the minimum info required in
> bug reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{3}>
> 

Looks like we need to synchronize pfsync destroy with timer thread.

thanks for great testing.

regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index f69790ee98d..24963a546de 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -1865,8 +1865,6 @@ pfsync_undefer(struct pfsync_deferral *pd, int drop)
 {
struct pfsync_softc *sc = pfsyncif;
 
-   NET_ASSERT_LOCKED();
-
if (sc == NULL)
return;
 
@@ -2128,8 +2126,6 @@ pfsync_delete_state(struct pf_state *st)
 {
struct pfsync_softc *sc = pfsyncif;
 
-   NET_ASSERT_LOCKED();
-
if (sc == NULL || !ISSET(sc->sc_if.if_flags, IFF_RUNNING))
return;
 
@@ -2188,8 +2184,6 @@ pfsync_clear_states(u_int32_t creatorid, const char 
*ifname)
struct pfsync_clr clr;
} __packed r;
 
-   NET_ASSERT_LOCKED();
-
if (sc == NULL || !ISSET(sc->sc_if.if_flags, IFF_RUNNING))
return;
 



Re: splassert on boot

2022-11-23 Thread Alexandr Nedvedicky
Hello,

On Thu, Nov 24, 2022 at 08:51:51AM +1000, David Gwynne wrote:
> im ok with this, but you need sashan@ to ok it too. he's been working up to 
> this anyway.
> 
> dlg
> 
> > On 24 Nov 2022, at 06:18, Vitaliy Makkoveev  wrote:
> > 
> > On Wed, Nov 23, 2022 at 02:59:05PM -0500, David Hill wrote:
> >> Hello -
> >> 
> >> I am seeing splasserts on boot (before kern.splassert=2 can be set) with
> >> -current.
> >> 
> >> 
> >> 
> >> spdmem0 at iic0 addr 0x50: 8GB DDR3 SDRAM PC3-12800 SO-DIMM
> >> isa0 at pcib0
> >> isadma0 at isa0
> >> vga0 at isa0 port 0x3b0/48 iomem 0xa/131072
> >> wsdisplay at vga0 not configured
> >> pcppi0 at isa0 port 0x61
> >> spkr0 at pcppi0
> >> vmm0 at mainbus0: VMX/EPT (using slow L1TF mitigation)
> >> splassert: pfi_attach_ifgroup: want 2 have 0
> >> splassert: pfi_group_addmember: want 2 have 0
> >> splassert: pfi_attach_ifgroup: want 2 have 0
> >> splassert: pfi_attach_ifgroup: want 2 have 0
> >> splassert: pfi_group_addmember: want 2 have 0
> >> 
> >> 
> >> - David
> >> 
> > 
> > The netlock assertion within PF_LOCK() looks wrong. The netlock should
> > be taken first, but only if both locks taken.
> > 
> > Index: sys/net/pfvar_priv.h
> > ===
> > RCS file: /cvs/src/sys/net/pfvar_priv.h,v
> > retrieving revision 1.21
> > diff -u -p -r1.21 pfvar_priv.h
> > --- sys/net/pfvar_priv.h11 Nov 2022 17:12:30 -  1.21
> > +++ sys/net/pfvar_priv.h23 Nov 2022 20:14:13 -
> > @@ -278,7 +278,6 @@ extern struct rwlockpf_lock;
> > extern struct rwlockpf_state_lock;
> > 
> > #define PF_LOCK()   do {\
> > -   NET_ASSERT_LOCKED();\
> > rw_enter_write(_lock);   \
> > } while (0)
> > 
> 

this diff is OK, please go ahead and commit it.

sorry, I have not hit this assert on my boxes, when testing this diff.

OK sashan




Re: interface hooks to pf(4) should be using PF_LOCK()/PF_UNLOCK()

2022-11-22 Thread Alexandr Nedvedicky
Hello,

this change is required to unhook pf(4) from NET_LOCK().
therefore I'd like to get it in.


On Mon, Nov 07, 2022 at 04:51:59AM +1000, David Gwynne wrote:
> 
> 
> > On 7 Nov 2022, at 4:12 am, Alexandr Nedvedicky  wrote:
> > 
> > Hello,
> > 
> > diff below is the first step to make pfioctl() _without_ NET_LOCK().
> > Currently pf_if.c seems to be the only blocker which prevents us
> > from removing all NET_LOCK()/NET_UNLOCK() calls we have in pf(4).
> > 
> > diff below passed very basic smoke test. OK to commit?
> > 
> > 
> > thanks and
> > regards
> > sashan
> > 
> > 8<---8<---8<--8<
> > diff --git a/sys/net/pf_if.c b/sys/net/pf_if.c
> > index e23c14e6769..e86d85aa2c4 100644
> > --- a/sys/net/pf_if.c
> > +++ b/sys/net/pf_if.c
> > @@ -53,10 +53,17 @@
> > 
> > #include 
> > 
> > +#include 
> > +#include 
> > +#include 
> 
> do we need this chunk?
> 

chunk above is required. It got sucked as a dependency for
pfvar_priv.h We need pfvar_priv.h because it provides definitions
for PF_LOCK(). With PF_LOCK() we are also getting definition of pf_pdesc,
which requires header files above.

thanks and
regards
sashan



make state key dereference safe for pfsync(4)

2022-11-16 Thread Alexandr Nedvedicky
Hello,

with state key mutex in a tree [1]. I'd like to add yet another diff.
during h2k22 David and I split original change [2] into two chunks.

OK to commit diff below?

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-cvs=166817856414079=2

[2] https://marc.info/?l=openbsd-bugs=166006758231954=2

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index f69790ee98d..e1a006a8faf 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -157,16 +157,16 @@ const struct {
 };
 
 struct pfsync_q {
-   void(*write)(struct pf_state *, void *);
+   int (*write)(struct pf_state *, void *);
size_t  len;
u_int8_taction;
 };
 
 /* we have one of these for every PFSYNC_S_ */
-void   pfsync_out_state(struct pf_state *, void *);
-void   pfsync_out_iack(struct pf_state *, void *);
-void   pfsync_out_upd_c(struct pf_state *, void *);
-void   pfsync_out_del(struct pf_state *, void *);
+intpfsync_out_state(struct pf_state *, void *);
+intpfsync_out_iack(struct pf_state *, void *);
+intpfsync_out_upd_c(struct pf_state *, void *);
+intpfsync_out_del(struct pf_state *, void *);
 
 struct pfsync_q pfsync_qs[] = {
{ pfsync_out_iack,  sizeof(struct pfsync_ins_ack), PFSYNC_ACT_INS_ACK },
@@ -1301,24 +1301,26 @@ pfsyncioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
return (0);
 }
 
-void
+int
 pfsync_out_state(struct pf_state *st, void *buf)
 {
struct pfsync_state *sp = buf;
 
pf_state_export(sp, st);
+   return (0);
 }
 
-void
+int
 pfsync_out_iack(struct pf_state *st, void *buf)
 {
struct pfsync_ins_ack *iack = buf;
 
iack->id = st->id;
iack->creatorid = st->creatorid;
+   return (0);
 }
 
-void
+int
 pfsync_out_upd_c(struct pf_state *st, void *buf)
 {
struct pfsync_upd_c *up = buf;
@@ -1329,9 +1331,10 @@ pfsync_out_upd_c(struct pf_state *st, void *buf)
pf_state_peer_hton(>dst, >dst);
up->creatorid = st->creatorid;
up->timeout = st->timeout;
+   return (0);
 }
 
-void
+int
 pfsync_out_del(struct pf_state *st, void *buf)
 {
struct pfsync_del_c *dp = buf;
@@ -1340,6 +1343,7 @@ pfsync_out_del(struct pf_state *st, void *buf)
dp->creatorid = st->creatorid;
 
SET(st->state_flags, PFSTATE_NOSYNC);
+   return (0);
 }
 
 void
@@ -1671,8 +1675,8 @@ pfsync_sendout(void)
KASSERT(st->snapped == 1);
st->sync_state = PFSYNC_S_NONE;
st->snapped = 0;
-   pfsync_qs[q].write(st, m->m_data + offset);
-   offset += pfsync_qs[q].len;
+   if (pfsync_qs[q].write(st, m->m_data + offset) == 0)
+   offset += pfsync_qs[q].len;
 
pf_state_unref(st);
count++;
diff --git a/sys/net/if_pfsync.h b/sys/net/if_pfsync.h
index ff26ac3669e..cc1d78d47a3 100644
--- a/sys/net/if_pfsync.h
+++ b/sys/net/if_pfsync.h
@@ -326,7 +326,7 @@ int pfsync_sysctl(int *, u_int,  void *, 
size_t *,
 #definePFSYNC_SI_CKSUM 0x02
 #definePFSYNC_SI_ACK   0x04
 intpfsync_state_import(struct pfsync_state *, int);
-void   pfsync_state_export(struct pfsync_state *,
+intpfsync_state_export(struct pfsync_state *,
struct pf_state *);
 
 void   pfsync_insert_state(struct pf_state *);
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 4203911ea60..3653e1359cf 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -181,7 +181,8 @@ int  pf_translate_icmp_af(struct pf_pdesc*, 
int, void *);
 voidpf_send_icmp(struct mbuf *, u_int8_t, u_int8_t, int,
sa_family_t, struct pf_rule *, u_int);
 voidpf_detach_state(struct pf_state *);
-voidpf_state_key_detach(struct pf_state *, int);
+voidpf_state_key_detach(struct pf_state *,
+   struct pf_state_key *);
 u_int32_t   pf_tcp_iss(struct pf_pdesc *);
 voidpf_rule_to_actions(struct pf_rule *,
struct pf_rule_actions *);
@@ -256,6 +257,9 @@ void 
pf_state_key_unlink_inpcb(struct pf_state_key *);
 voidpf_inpcb_unlink_state_key(struct inpcb *);
 voidpf_pktenqueue_delayed(void *);
 int32_t pf_state_expires(const struct pf_state *, 
uint8_t);
+voidpf_state_keys_take(struct pf_state *,
+   struct pf_state_key **);
+voidpf_state_keys_rele(struct pf_state_key **);
 
 #if NPFLOG > 0
 voidpf_log_matches(struct pf_pdesc *, struct pf_rule *,
@@ -772,7 +776,8 @@ 

relax KASSET() to if () in pfsync_grab_snapshot()

2022-11-11 Thread Alexandr Nedvedicky
Hello,

Diff below changes KASSERT() to if (). We have to prevent
packets to insert state to snapshot queue multiple times.
Hrvoje@ can trigger situation where state updates to pfsync
peer are more frequent than we are able to send out.

OK to go for this simple fix/workaround ?

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index d279ede9cd6..fdff3d7a509 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -1362,10 +1362,17 @@ pfsync_grab_snapshot(struct pfsync_snapshot *sn, struct 
pfsync_softc *sc)
TAILQ_INIT(>sn_qs[q]);
 
while ((st = TAILQ_FIRST(>sc_qs[q])) != NULL) {
-   KASSERT(st->snapped == 0);
TAILQ_REMOVE(>sc_qs[q], st, sync_list);
-   TAILQ_INSERT_TAIL(>sn_qs[q], st, sync_snap);
-   st->snapped = 1;
+   if (st->snapped == 0) {
+   TAILQ_INSERT_TAIL(>sn_qs[q], st, sync_snap);
+   st->snapped = 1;
+   } else {
+   /*
+* item is on snapshot list already, just give
+* up this attempt to update it.
+*/
+   pf_state_unref(st);
+   }
}
}
 



Re: adding a mutex to pf_state

2022-11-10 Thread Alexandr Nedvedicky
Hello,

David (dlg@) asked me to shrink the scope of the change.  The new diff
introduces a mutex to pf_state and adjust pf_detach_state() to utilize the
newly introduced mutex.  I've also added annotation to pf_state members. The
members with '?' needs our attention. We need to verify if they are either
protected by global state lock, or if we cover them by newly introduced
mutex.

new diff also adds mtx_init() to pf_state_import(). This was missing in
my earlier diff. It was found and reported by Hrvoje@.

thanks and
regards
sashan


8<---8<---8<--8<
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 2c13c99baac..c6022867dd4 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -185,7 +185,8 @@ int  pf_translate_icmp_af(struct pf_pdesc*, 
int, void *);
 voidpf_send_icmp(struct mbuf *, u_int8_t, u_int8_t, int,
sa_family_t, struct pf_rule *, u_int);
 voidpf_detach_state(struct pf_state *);
-voidpf_state_key_detach(struct pf_state *, int);
+voidpf_state_key_detach(struct pf_state *,
+   struct pf_state_key *);
 u_int32_t   pf_tcp_iss(struct pf_pdesc *);
 voidpf_rule_to_actions(struct pf_rule *,
struct pf_rule_actions *);
@@ -779,7 +780,8 @@ pf_state_key_attach(struct pf_state_key *sk, struct 
pf_state *s, int idx)
s->key[idx] = sk;
 
if ((si = pool_get(_state_item_pl, PR_NOWAIT)) == NULL) {
-   pf_state_key_detach(s, idx);
+   pf_state_key_detach(s, s->key[idx]);
+   s->key[idx] = NULL;
return (-1);
}
si->s = s;
@@ -799,42 +801,50 @@ pf_state_key_attach(struct pf_state_key *sk, struct 
pf_state *s, int idx)
 void
 pf_detach_state(struct pf_state *s)
 {
-   if (s->key[PF_SK_WIRE] == s->key[PF_SK_STACK])
-   s->key[PF_SK_WIRE] = NULL;
+   struct pf_state_key *key[2];
+
+   mtx_enter(>mtx);
+   key[PF_SK_WIRE] = s->key[PF_SK_WIRE];
+   key[PF_SK_STACK] = s->key[PF_SK_STACK];
+   s->key[PF_SK_WIRE] = NULL;
+   s->key[PF_SK_STACK] = NULL;
+   mtx_leave(>mtx);
+
+   if (key[PF_SK_WIRE] == key[PF_SK_STACK])
+   key[PF_SK_WIRE] = NULL;
 
-   if (s->key[PF_SK_STACK] != NULL)
-   pf_state_key_detach(s, PF_SK_STACK);
+   if (key[PF_SK_STACK] != NULL)
+   pf_state_key_detach(s, key[PF_SK_STACK]);
 
-   if (s->key[PF_SK_WIRE] != NULL)
-   pf_state_key_detach(s, PF_SK_WIRE);
+   if (key[PF_SK_WIRE] != NULL)
+   pf_state_key_detach(s, key[PF_SK_WIRE]);
 }
 
 void
-pf_state_key_detach(struct pf_state *s, int idx)
+pf_state_key_detach(struct pf_state *s, struct pf_state_key *key)
 {
struct pf_state_item*si;
-   struct pf_state_key *sk;
 
-   if (s->key[idx] == NULL)
+   PF_STATE_ASSERT_LOCKED();
+
+   if (key == NULL)
return;
 
-   si = TAILQ_FIRST(>key[idx]->states);
+   si = TAILQ_FIRST(>states);
while (si && si->s != s)
si = TAILQ_NEXT(si, entry);
 
if (si) {
-   TAILQ_REMOVE(>key[idx]->states, si, entry);
+   TAILQ_REMOVE(>states, si, entry);
pool_put(_state_item_pl, si);
}
 
-   sk = s->key[idx];
-   s->key[idx] = NULL;
-   if (TAILQ_EMPTY(>states)) {
-   RB_REMOVE(pf_state_tree, _statetbl, sk);
-   sk->removed = 1;
-   pf_state_key_unlink_reverse(sk);
-   pf_state_key_unlink_inpcb(sk);
-   pf_state_key_unref(sk);
+   if (TAILQ_EMPTY(>states)) {
+   RB_REMOVE(pf_state_tree, _statetbl, key);
+   key->removed = 1;
+   pf_state_key_unlink_reverse(key);
+   pf_state_key_unlink_inpcb(key);
+   pf_state_key_unref(key);
}
 }
 
@@ -997,7 +1007,9 @@ pf_state_insert(struct pfi_kif *kif, struct pf_state_key 
**skw,
}
*skw = s->key[PF_SK_WIRE];
if (pf_state_key_attach(*sks, s, PF_SK_STACK)) {
-   pf_state_key_detach(s, PF_SK_WIRE);
+   pf_state_key_detach(s, s->key[PF_SK_WIRE]);
+   s->key[PF_SK_WIRE] = NULL;
+   *skw = NULL;
PF_STATE_EXIT_WRITE();
return (-1);
}
@@ -1429,6 +1441,7 @@ pf_state_import(const struct pfsync_state *sp, int flags)
st->sync_state = PFSYNC_S_NONE;
 
refcnt_init(>refcnt);
+   mtx_init(>mtx, IPL_NET);
 
/* XXX when we have anchors, use STATE_INC_COUNTERS */
r->states_cur++;
@@ -4348,6 +4361,7 @@ pf_create_state(struct pf_pdesc *pd, struct pf_rule *r, 
struct pf_rule *a,
 * pf_state_inserts() grabs reference for pfsync!
 */

adding a mutex to pf_state

2022-11-09 Thread Alexandr Nedvedicky
hello,

diff below adds a mutex to pf_state. It fixes a NULL pointer dereference panic
reported by Hrvoje sometime ago [1].  Besides adding a mutex to state the diff
addresses a race between pfsync and state purge thread. What happened in this
particular case was that state expired and its state keys got detached while it
was waiting to be processed by pfsync. Once pfsync got to it found state
keys detached and tripped on null pointer dereference. This is the
race change below fixes.

I'm not too much worried about contention on newly introduced mutex.
the thing is it is not a global mutex it is a per state mutex (per object
mutex). I don't expect to see two cpu's will be updating same state
very often.

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-bugs=166006758231954=2

8<---8<---8<--8<
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index d279ede9cd6..5f92ae6ec45 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -157,16 +157,16 @@ const struct {
 };
 
 struct pfsync_q {
-   void(*write)(struct pf_state *, void *);
+   int (*write)(struct pf_state *, void *);
size_t  len;
u_int8_taction;
 };
 
 /* we have one of these for every PFSYNC_S_ */
-void   pfsync_out_state(struct pf_state *, void *);
-void   pfsync_out_iack(struct pf_state *, void *);
-void   pfsync_out_upd_c(struct pf_state *, void *);
-void   pfsync_out_del(struct pf_state *, void *);
+intpfsync_out_state(struct pf_state *, void *);
+intpfsync_out_iack(struct pf_state *, void *);
+intpfsync_out_upd_c(struct pf_state *, void *);
+intpfsync_out_del(struct pf_state *, void *);
 
 struct pfsync_q pfsync_qs[] = {
{ pfsync_out_iack,  sizeof(struct pfsync_ins_ack), PFSYNC_ACT_INS_ACK },
@@ -1301,24 +1301,26 @@ pfsyncioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
return (0);
 }
 
-void
+int
 pfsync_out_state(struct pf_state *st, void *buf)
 {
struct pfsync_state *sp = buf;
 
pf_state_export(sp, st);
+   return (0);
 }
 
-void
+int
 pfsync_out_iack(struct pf_state *st, void *buf)
 {
struct pfsync_ins_ack *iack = buf;
 
iack->id = st->id;
iack->creatorid = st->creatorid;
+   return (0);
 }
 
-void
+int
 pfsync_out_upd_c(struct pf_state *st, void *buf)
 {
struct pfsync_upd_c *up = buf;
@@ -1329,9 +1331,10 @@ pfsync_out_upd_c(struct pf_state *st, void *buf)
pf_state_peer_hton(>dst, >dst);
up->creatorid = st->creatorid;
up->timeout = st->timeout;
+   return (0);
 }
 
-void
+int
 pfsync_out_del(struct pf_state *st, void *buf)
 {
struct pfsync_del_c *dp = buf;
@@ -1340,6 +1343,7 @@ pfsync_out_del(struct pf_state *st, void *buf)
dp->creatorid = st->creatorid;
 
SET(st->state_flags, PFSTATE_NOSYNC);
+   return (0);
 }
 
 void
@@ -1664,8 +1668,8 @@ pfsync_sendout(void)
KASSERT(st->snapped == 1);
st->sync_state = PFSYNC_S_NONE;
st->snapped = 0;
-   pfsync_qs[q].write(st, m->m_data + offset);
-   offset += pfsync_qs[q].len;
+   if (pfsync_qs[q].write(st, m->m_data + offset) == 0)
+   offset += pfsync_qs[q].len;
 
pf_state_unref(st);
count++;
diff --git a/sys/net/if_pfsync.h b/sys/net/if_pfsync.h
index 5447b829d74..30c1df0de9c 100644
--- a/sys/net/if_pfsync.h
+++ b/sys/net/if_pfsync.h
@@ -326,7 +326,7 @@ int pfsync_sysctl(int *, u_int,  void *, 
size_t *,
 #definePFSYNC_SI_CKSUM 0x02
 #definePFSYNC_SI_ACK   0x04
 intpfsync_state_import(struct pfsync_state *, int);
-void   pfsync_state_export(struct pfsync_state *,
+intpfsync_state_export(struct pfsync_state *,
struct pf_state *);
 
 void   pfsync_insert_state(struct pf_state *);
diff --git a/sys/net/pf.c b/sys/net/pf.c
index c42f76dbc67..1083ee95b9a 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -185,7 +185,8 @@ int  pf_translate_icmp_af(struct pf_pdesc*, 
int, void *);
 voidpf_send_icmp(struct mbuf *, u_int8_t, u_int8_t, int,
sa_family_t, struct pf_rule *, u_int);
 voidpf_detach_state(struct pf_state *);
-voidpf_state_key_detach(struct pf_state *, int);
+voidpf_state_key_detach(struct pf_state *,
+   struct pf_state_key *);
 u_int32_t   pf_tcp_iss(struct pf_pdesc *);
 voidpf_rule_to_actions(struct pf_rule *,
struct pf_rule_actions *);
@@ -260,6 +261,9 @@ void 
pf_state_key_unlink_inpcb(struct pf_state_key *);
 void

Re: tweak pfsync status output in ifconfig

2022-11-08 Thread Alexandr Nedvedicky
On Wed, Nov 09, 2022 at 12:09:50AM +1000, David Gwynne wrote:
> i'm hacking on pfsync(4) at the moment, and i wasted way too much time
> wondering how i broke the pfsync ioctls after i didn't the pfsync_status
> output. turns out if you don't have a sync interface set, it skips
> output.
> 
> i think it's useful to show that the sync interface is not set, so i
> came up with this.
> 
> an unconfigured pfsync interface looks like this with my diff:
> 
> pfsync0: flags=0<> mtu 1500
>   index 5 priority 0 llprio 3
>   pfsync: syncdev none maxupd 128 defer off
>   groups: carp pfsync

looks useful to me. I certainly don't object.

OK sashan



Re: simplify handling of 'once' rules in pf(4)

2022-11-07 Thread Alexandr Nedvedicky
Hello,

updated diff. It buys suggestions from Klemens:

pf.conf(5) section which covers once rules reads as follows:

 onceCreates a one shot rule. The first matching packet marks rule as
 expired.  The expired rule is never evaluated then.  pfctl(8)
 does not report expired rules unless run in verbose mode ('-vv').
 In verbose mode pfctl(8) appends  '# expired' to note the once
 rule which got hit by packet other already.

this how pfctl reports expired once rules now:

pf# pfctl -a test/foo/bar -vv -sr 
@0 pass out proto tcp from any to any port = 80 flags S/SA once # 
expired
  [ Evaluations: 25Packets: 5489  Bytes: 4936817 
States: 0 ]
  [ Inserted: uid 0 pid 2768 State Creations: 1 ]

updated diff is below.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl_parser.c b/sbin/pfctl/pfctl_parser.c
index 6f39ad72384..fbe19747fc6 100644
--- a/sbin/pfctl/pfctl_parser.c
+++ b/sbin/pfctl/pfctl_parser.c
@@ -1148,6 +1148,9 @@ print_rule(struct pf_rule *r, const char *anchor_call, 
int opts)
printf(" ");
print_pool(>route, 0, 0, r->af, PF_POOL_ROUTE, verbose);
}
+
+   if (r->rule_flag & PFRULE_EXPIRED)
+   printf(" # expired");
 }
 
 void
diff --git a/share/man/man5/pf.conf.5 b/share/man/man5/pf.conf.5
index 3e5a17acb95..b98b9f2fd9c 100644
--- a/share/man/man5/pf.conf.5
+++ b/share/man/man5/pf.conf.5
@@ -661,10 +661,14 @@ When the rate is exceeded, all ICMP is blocked until the 
rate falls below
 100 per 10 seconds again.
 .Pp
 .It Cm once
-Creates a one shot rule that will remove itself from an active ruleset after
-the first match.
-In case this is the only rule in the anchor, the anchor will be destroyed
-automatically after the rule is matched.
+Creates a one shot rule. The first matching packet marks rule as expired.
+The expired rule is never evaluated then.
+.Xr pfctl 8
+does not report expired rules unless run in verbose mode ('-vv'). In verbose
+mode
+.Xr pfctl 8
+appends  '# expired' to note the once rule which got hit by packet other
+already.
 .Pp
 .It Cm probability Ar number Ns %
 A probability attribute can be attached to a rule,
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 2c6124e74f2..6295b4eb9d7 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -313,9 +313,6 @@ RB_GENERATE(pf_state_tree, pf_state_key, entry, 
pf_state_compare_key);
 RB_GENERATE(pf_state_tree_id, pf_state,
 entry_id, pf_state_compare_id);
 
-SLIST_HEAD(pf_rule_gcl, pf_rule)   pf_rule_gcl =
-   SLIST_HEAD_INITIALIZER(pf_rule_gcl);
-
 __inline int
 pf_addr_compare(struct pf_addr *a, struct pf_addr *b, sa_family_t af)
 {
@@ -1478,23 +1475,6 @@ pf_state_import(const struct pfsync_state *sp, int flags)
 
 /* END state table stuff */
 
-void
-pf_purge_expired_rules(void)
-{
-   struct pf_rule  *r;
-
-   PF_ASSERT_LOCKED();
-
-   if (SLIST_EMPTY(_rule_gcl))
-   return;
-
-   while ((r = SLIST_FIRST(_rule_gcl)) != NULL) {
-   SLIST_REMOVE(_rule_gcl, r, pf_rule, gcle);
-   KASSERT(r->rule_flag & PFRULE_EXPIRED);
-   pf_purge_rule(r);
-   }
-}
-
 voidpf_purge_states(void *);
 struct task pf_purge_states_task =
 TASK_INITIALIZER(pf_purge_states, NULL);
@@ -1588,7 +1568,6 @@ pf_purge(void *null)
PF_LOCK();
 
pf_purge_expired_src_nodes();
-   pf_purge_expired_rules();
 
PF_UNLOCK();
 
@@ -3916,8 +3895,11 @@ pf_match_rule(struct pf_test_ctx *ctx, struct pf_ruleset 
*ruleset)
struct pf_rule  *save_a;
struct pf_ruleset   *save_aruleset;
 
+retry:
r = TAILQ_FIRST(ruleset->rules.active.ptr);
while (r != NULL) {
+   PF_TEST_ATTRIB(r->rule_flag & PFRULE_EXPIRED,
+   TAILQ_NEXT(r, entries));
r->evaluations++;
PF_TEST_ATTRIB(
(pfi_kif_match(r->kif, ctx->pd->kif) == r->ifnot),
@@ -4042,6 +4024,19 @@ pf_match_rule(struct pf_test_ctx *ctx, struct pf_ruleset 
*ruleset)
if (r->tag)
ctx->tag = r->tag;
if (r->anchor == NULL) {
+
+   if (r->rule_flag & PFRULE_ONCE) {
+   u_int32_t   rule_flag;
+
+   rule_flag = r->rule_flag;
+   if (((rule_flag & PFRULE_EXPIRED) == 0) &&
+   atomic_cas_uint(>rule_flag, rule_flag,
+   rule_flag | PFRULE_EXPIRED) == rule_flag)
+   r->exptime = gettime();
+   else
+   goto retry;
+   }
+
if (r->action == PF_MATCH) {
if 

Re: simplify handling of 'once' rules in pf(4)

2022-11-07 Thread Alexandr Nedvedicky
Hello,

resending the same diff, just updated to current.
(pointed out by dlg@)

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 2c6124e74f2..6295b4eb9d7 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -313,9 +313,6 @@ RB_GENERATE(pf_state_tree, pf_state_key, entry, 
pf_state_compare_key);
 RB_GENERATE(pf_state_tree_id, pf_state,
 entry_id, pf_state_compare_id);
 
-SLIST_HEAD(pf_rule_gcl, pf_rule)   pf_rule_gcl =
-   SLIST_HEAD_INITIALIZER(pf_rule_gcl);
-
 __inline int
 pf_addr_compare(struct pf_addr *a, struct pf_addr *b, sa_family_t af)
 {
@@ -1478,23 +1475,6 @@ pf_state_import(const struct pfsync_state *sp, int flags)
 
 /* END state table stuff */
 
-void
-pf_purge_expired_rules(void)
-{
-   struct pf_rule  *r;
-
-   PF_ASSERT_LOCKED();
-
-   if (SLIST_EMPTY(_rule_gcl))
-   return;
-
-   while ((r = SLIST_FIRST(_rule_gcl)) != NULL) {
-   SLIST_REMOVE(_rule_gcl, r, pf_rule, gcle);
-   KASSERT(r->rule_flag & PFRULE_EXPIRED);
-   pf_purge_rule(r);
-   }
-}
-
 voidpf_purge_states(void *);
 struct task pf_purge_states_task =
 TASK_INITIALIZER(pf_purge_states, NULL);
@@ -1588,7 +1568,6 @@ pf_purge(void *null)
PF_LOCK();
 
pf_purge_expired_src_nodes();
-   pf_purge_expired_rules();
 
PF_UNLOCK();
 
@@ -3916,8 +3895,11 @@ pf_match_rule(struct pf_test_ctx *ctx, struct pf_ruleset 
*ruleset)
struct pf_rule  *save_a;
struct pf_ruleset   *save_aruleset;
 
+retry:
r = TAILQ_FIRST(ruleset->rules.active.ptr);
while (r != NULL) {
+   PF_TEST_ATTRIB(r->rule_flag & PFRULE_EXPIRED,
+   TAILQ_NEXT(r, entries));
r->evaluations++;
PF_TEST_ATTRIB(
(pfi_kif_match(r->kif, ctx->pd->kif) == r->ifnot),
@@ -4042,6 +4024,19 @@ pf_match_rule(struct pf_test_ctx *ctx, struct pf_ruleset 
*ruleset)
if (r->tag)
ctx->tag = r->tag;
if (r->anchor == NULL) {
+
+   if (r->rule_flag & PFRULE_ONCE) {
+   u_int32_t   rule_flag;
+
+   rule_flag = r->rule_flag;
+   if (((rule_flag & PFRULE_EXPIRED) == 0) &&
+   atomic_cas_uint(>rule_flag, rule_flag,
+   rule_flag | PFRULE_EXPIRED) == rule_flag)
+   r->exptime = gettime();
+   else
+   goto retry;
+   }
+
if (r->action == PF_MATCH) {
if ((ctx->ri = pool_get(_rule_item_pl,
PR_NOWAIT)) == NULL) {
@@ -4253,13 +4248,6 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, 
struct pf_state **sm,
if (r->action == PF_DROP)
goto cleanup;
 
-   /*
-* If an expired "once" rule has not been purged, drop any new matching
-* packets.
-*/
-   if (r->rule_flag & PFRULE_EXPIRED)
-   goto cleanup;
-
pf_tag_packet(pd->m, ctx.tag, ctx.act.rtableid);
if (ctx.act.rtableid >= 0 &&
rtable_l2(ctx.act.rtableid) != pd->rdomain)
@@ -4330,22 +4318,6 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, 
struct pf_state **sm,
m_copyback(pd->m, pd->off, pd->hdrlen, >hdr, M_NOWAIT);
}
 
-   if (r->rule_flag & PFRULE_ONCE) {
-   u_int32_t   rule_flag;
-
-   /*
-* Use atomic_cas() to determine a clear winner, which will
-* insert an expired rule to gcl.
-*/
-   rule_flag = r->rule_flag;
-   if (((rule_flag & PFRULE_EXPIRED) == 0) &&
-   atomic_cas_uint(>rule_flag, rule_flag,
-   rule_flag | PFRULE_EXPIRED) == rule_flag) {
-   r->exptime = gettime();
-   SLIST_INSERT_HEAD(_rule_gcl, r, gcle);
-   }
-   }
-
 #if NPFSYNC > 0
if (*sm != NULL && !ISSET((*sm)->state_flags, PFSTATE_NOSYNC) &&
pd->dir == PF_OUT && pfsync_up()) {
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index b5d3261f8d1..6744fc022fb 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -348,27 +348,6 @@ pf_rm_rule(struct pf_rulequeue *rulequeue, struct pf_rule 
*rule)
pool_put(_rule_pl, rule);
 }
 
-void
-pf_purge_rule(struct pf_rule *rule)
-{
-   u_int32_tnr = 0;
-   struct pf_ruleset   *ruleset;
-
-   KASSERT((rule != NULL) && (rule->ruleset != NULL));
-   ruleset = rule->ruleset;
-
-   pf_rm_rule(ruleset->rules.active.ptr, rule);
-   ruleset->rules.active.rcount--;
-   

simplify handling of 'once' rules in pf(4)

2022-11-07 Thread Alexandr Nedvedicky
Hello,

diff below simplifies handling of 'once' rules in pf(4).
Currently matching packet marks 'once' rule as expired
and puts it to garbage collection list, where the rule
waits to be removed from its ruleset by timer.

diff below simplifies that. matching packet marks
once rule as expired and leaves that rule in ruleset.
The 'expired' flag prevents other packets to match
such rule. The expired rule will be kept in ruleset until
the  ruleset itself is either destroyed or updated by DIOCXCOMMIT
operation.

OK ?

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 4c70b08571e..1c06e2f6eeb 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -317,9 +317,6 @@ RB_GENERATE(pf_state_tree, pf_state_key, entry, 
pf_state_compare_key);
 RB_GENERATE(pf_state_tree_id, pf_state,
 entry_id, pf_state_compare_id);
 
-SLIST_HEAD(pf_rule_gcl, pf_rule)   pf_rule_gcl =
-   SLIST_HEAD_INITIALIZER(pf_rule_gcl);
-
 __inline int
 pf_addr_compare(struct pf_addr *a, struct pf_addr *b, sa_family_t af)
 {
@@ -1263,23 +1260,6 @@ pf_state_export(struct pfsync_state *sp, struct pf_state 
*st)
 
 /* END state table stuff */
 
-void
-pf_purge_expired_rules(void)
-{
-   struct pf_rule  *r;
-
-   PF_ASSERT_LOCKED();
-
-   if (SLIST_EMPTY(_rule_gcl))
-   return;
-
-   while ((r = SLIST_FIRST(_rule_gcl)) != NULL) {
-   SLIST_REMOVE(_rule_gcl, r, pf_rule, gcle);
-   KASSERT(r->rule_flag & PFRULE_EXPIRED);
-   pf_purge_rule(r);
-   }
-}
-
 void
 pf_purge_timeout(void *unused)
 {
@@ -1307,10 +1287,8 @@ pf_purge(void *xnloops)
 
PF_LOCK();
/* purge other expired types every PFTM_INTERVAL seconds */
-   if (++(*nloops) >= pf_default_rule.timeout[PFTM_INTERVAL]) {
+   if (++(*nloops) >= pf_default_rule.timeout[PFTM_INTERVAL])
pf_purge_expired_src_nodes();
-   pf_purge_expired_rules();
-   }
PF_UNLOCK();
 
/*
@@ -3625,8 +3603,11 @@ pf_match_rule(struct pf_test_ctx *ctx, struct pf_ruleset 
*ruleset)
struct pf_rule  *save_a;
struct pf_ruleset   *save_aruleset;
 
+retry:
r = TAILQ_FIRST(ruleset->rules.active.ptr);
while (r != NULL) {
+   PF_TEST_ATTRIB(r->rule_flag & PFRULE_EXPIRED,
+   TAILQ_NEXT(r, entries));
r->evaluations++;
PF_TEST_ATTRIB(
(pfi_kif_match(r->kif, ctx->pd->kif) == r->ifnot),
@@ -3751,6 +3732,19 @@ pf_match_rule(struct pf_test_ctx *ctx, struct pf_ruleset 
*ruleset)
if (r->tag)
ctx->tag = r->tag;
if (r->anchor == NULL) {
+
+   if (r->rule_flag & PFRULE_ONCE) {
+   u_int32_t   rule_flag;
+
+   rule_flag = r->rule_flag;
+   if (((rule_flag & PFRULE_EXPIRED) == 0) &&
+   atomic_cas_uint(>rule_flag, rule_flag,
+   rule_flag | PFRULE_EXPIRED) == rule_flag)
+   r->exptime = gettime();
+   else
+   goto retry;
+   }
+
if (r->action == PF_MATCH) {
if ((ctx->ri = pool_get(_rule_item_pl,
PR_NOWAIT)) == NULL) {
@@ -3962,13 +3956,6 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, 
struct pf_state **sm,
if (r->action == PF_DROP)
goto cleanup;
 
-   /*
-* If an expired "once" rule has not been purged, drop any new matching
-* packets.
-*/
-   if (r->rule_flag & PFRULE_EXPIRED)
-   goto cleanup;
-
pf_tag_packet(pd->m, ctx.tag, ctx.act.rtableid);
if (ctx.act.rtableid >= 0 &&
rtable_l2(ctx.act.rtableid) != pd->rdomain)
@@ -4039,22 +4026,6 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, 
struct pf_state **sm,
m_copyback(pd->m, pd->off, pd->hdrlen, >hdr, M_NOWAIT);
}
 
-   if (r->rule_flag & PFRULE_ONCE) {
-   u_int32_t   rule_flag;
-
-   /*
-* Use atomic_cas() to determine a clear winner, which will
-* insert an expired rule to gcl.
-*/
-   rule_flag = r->rule_flag;
-   if (((rule_flag & PFRULE_EXPIRED) == 0) &&
-   atomic_cas_uint(>rule_flag, rule_flag,
-   rule_flag | PFRULE_EXPIRED) == rule_flag) {
-   r->exptime = gettime();
-   SLIST_INSERT_HEAD(_rule_gcl, r, gcle);
-   }
-   }
-
 #if NPFSYNC > 0
if (*sm != NULL && !ISSET((*sm)->state_flags, PFSTATE_NOSYNC) &&
pd->dir == PF_OUT && pfsync_up()) {
diff --git 

interface hooks to pf(4) should be using PF_LOCK()/PF_UNLOCK()

2022-11-06 Thread Alexandr Nedvedicky
Hello,

diff below is the first step to make pfioctl() _without_ NET_LOCK().
Currently pf_if.c seems to be the only blocker which prevents us
from removing all NET_LOCK()/NET_UNLOCK() calls we have in pf(4).

diff below passed very basic smoke test. OK to commit?


thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_if.c b/sys/net/pf_if.c
index e23c14e6769..e86d85aa2c4 100644
--- a/sys/net/pf_if.c
+++ b/sys/net/pf_if.c
@@ -53,10 +53,17 @@
 
 #include 
 
+#include 
+#include 
+#include 
+
 #ifdef INET6
 #include 
+#include 
 #endif /* INET6 */
 
+#include 
+
 #define isupper(c) ((c) >= 'A' && (c) <= 'Z')
 #define islower(c) ((c) >= 'a' && (c) <= 'z')
 #define isalpha(c) (isupper(c)||islower(c))
@@ -296,6 +303,7 @@ pfi_attach_ifnet(struct ifnet *ifp)
struct pfi_kif  *kif;
struct task *t;
 
+   PF_LOCK();
pfi_initialize();
pfi_update++;
if ((kif = pfi_kif_get(ifp->if_xname, NULL)) == NULL)
@@ -310,6 +318,7 @@ pfi_attach_ifnet(struct ifnet *ifp)
kif->pfik_ah_cookie = t;
 
pfi_kif_update(kif);
+   PF_UNLOCK();
 }
 
 void
@@ -321,6 +330,7 @@ pfi_detach_ifnet(struct ifnet *ifp)
if ((kif = (struct pfi_kif *)ifp->if_pf_kif) == NULL)
return;
 
+   PF_LOCK();
pfi_update++;
t = kif->pfik_ah_cookie;
kif->pfik_ah_cookie = NULL;
@@ -332,6 +342,7 @@ pfi_detach_ifnet(struct ifnet *ifp)
kif->pfik_ifp = NULL;
ifp->if_pf_kif = NULL;
pfi_kif_unref(kif, PFI_KIF_REF_NONE);
+   PF_UNLOCK();
 }
 
 void
@@ -339,6 +350,7 @@ pfi_attach_ifgroup(struct ifg_group *ifg)
 {
struct pfi_kif  *kif;
 
+   PF_LOCK();
pfi_initialize();
pfi_update++;
if ((kif = pfi_kif_get(ifg->ifg_group, NULL)) == NULL)
@@ -346,6 +358,7 @@ pfi_attach_ifgroup(struct ifg_group *ifg)
 
kif->pfik_group = ifg;
ifg->ifg_pf_kif = (caddr_t)kif;
+   PF_UNLOCK();
 }
 
 void
@@ -356,11 +369,13 @@ pfi_detach_ifgroup(struct ifg_group *ifg)
if ((kif = (struct pfi_kif *)ifg->ifg_pf_kif) == NULL)
return;
 
+   PF_LOCK();
pfi_update++;
 
kif->pfik_group = NULL;
ifg->ifg_pf_kif = NULL;
pfi_kif_unref(kif, PFI_KIF_REF_NONE);
+   PF_UNLOCK();
 }
 
 void
@@ -378,15 +393,19 @@ pfi_group_change(const char *group)
 void
 pfi_group_delmember(const char *group)
 {
+   PF_LOCK();
pfi_group_change(group);
pfi_xcommit();
+   PF_UNLOCK();
 }
 
 void
 pfi_group_addmember(const char *group)
 {
+   PF_LOCK();
pfi_group_change(group);
pfi_xcommit();
+   PF_UNLOCK();
 }
 
 int
@@ -686,8 +705,10 @@ pfi_kifaddr_update(void *v)
 
NET_ASSERT_LOCKED();
 
+   PF_LOCK();
pfi_update++;
pfi_kif_update(kif);
+   PF_UNLOCK();
 }
 
 int



Re: MAKEDEV: there's only one pf(4) device

2022-11-06 Thread Alexandr Nedvedicky
On Sun, Nov 06, 2022 at 04:48:11PM +, Klemens Nanni wrote:
> On Sun, Nov 06, 2022 at 05:43:15PM +0100, Alexandr Nedvedicky wrote:
> > On Sun, Nov 06, 2022 at 01:06:02PM +, Klemens Nanni wrote:
> > > Just like bpf(4)... except bpf actually also creates bpf0 still.
> > > 
> > >   $ ls -1 /dev/{b,}pf*
> > >   /dev/bpf
> > >   /dev/bpf0
> > >   /dev/pf
> > > 
> > > OK?
> > 
> > looks ok to me, I think no on one expects to create /dev/pf0, /dev/pf1 
> > ...
> > Would it also make sense to update MAKEDEV(8) too. It currently
> > reads as:
> 
> Scripts and manuals are auto-generated for all architectures, so once
> the MAKEDEV.common template is committed, one has to regenerate and
> commit etc/ and usr/share/man/man8/.

I did not know that. Then it should be committed.

OK sashan



Re: MAKEDEV: there's only one pf(4) device

2022-11-06 Thread Alexandr Nedvedicky
On Sun, Nov 06, 2022 at 01:06:02PM +, Klemens Nanni wrote:
> Just like bpf(4)... except bpf actually also creates bpf0 still.
> 
>   $ ls -1 /dev/{b,}pf*
>   /dev/bpf
>   /dev/bpf0
>   /dev/pf
> 
> OK?

looks ok to me, I think no on one expects to create /dev/pf0, /dev/pf1 ...
Would it also make sense to update MAKEDEV(8) too. It currently
reads as:

 Special purpose devices
 apm Power Management Interface, see apm(4).
 audio*  Audio devices, see audio(4).
 bio ioctl tunnel pseudo-device, see bio(4).
 ...
 pctr*   PC Performance Tuning Register access device, see pctr(4).
 pf* Packet Filter, see pf(4).
 pppx*   PPP Multiplexer, see pppx(4).


thanks and
regards
sashan



Re: unlock pf_purge

2022-11-06 Thread Alexandr Nedvedicky
Hello,


> 
> claudio had some questions about numbers used by the diff. some of the
> maths here is set up so that every pf state purge run is guaranteed to
> do a MINUMUM amount of work. by default pf is configured with a purge
> interfval of 10 seconds, bu the pf state purge processing runs every
> second and tries to process 1/interval (ie 1/10th) of the states. if we
> have 4 states, then 4/10 is 0, so no states would be purged. by having a
> minimum of 64 states processed every second, we guarantee this small
> number of states gets processed.
> 
> i would like to commit this diff now (at the start of h2k22) so i can
> keep an eye on it. ive been running it for a while without issues.
> 
> the obvious next step will be to remove the NET_LOCKs.

I think this can go in as-is. I'm fine with change.
found just small nit, but it's more a matter of personal state.
leaving it up to David to go for it or not. Details are below.

OK sashan

> 
> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1141
> diff -u -p -r1.1141 pf.c
> --- pf.c  10 Oct 2022 16:43:12 -  1.1141
> +++ pf.c  6 Nov 2022 13:08:15 -


> @@ -1559,28 +1619,38 @@ pf_purge_expired_states(u_int32_t maxche
>  
>   now = getuptime();
>  
> - do {
> + for (scanned = 0; scanned < limit; scanned++) {
>   uint8_t stimeout = cur->timeout;
> + unsigned int limited = 0;
>  
>   if ((stimeout == PFTM_UNLINKED) ||
>   (pf_state_expires(cur, stimeout) <= now)) {
>   st = pf_state_ref(cur);
>   SLIST_INSERT_HEAD(, st, gc_list);
> +
> + if (++collected >= collect)
> + limited = 1;
>   }

I think we can get away without introducing a 'limited' local
variable.

>  
>   /* don't iterate past the end of our view of the list */
>   if (cur == tail) {
> + scanned = limit;
>   cur = NULL;
>   break;
>   }
>  
>   cur = TAILQ_NEXT(cur, entry_list);
> - } while (maxcheck--);
> +
> + /* don't spend too much time here. */
> + if (ISSET(READ_ONCE(curcpu()->ci_schedstate.spc_schedflags),
> +  SPCF_SHOULDYIELD) || limited)
> + break;
> + }
>  

and of course the condition above needs to be changed to:
 SPCF_SHOULDYIELD) || (collected >= collect))

As I've said it's just nit.



Re: make /dev/pf a clonable device

2022-11-06 Thread Alexandr Nedvedicky
On Sun, Nov 06, 2022 at 10:42:49PM +1000, David Gwynne wrote:
> this is a small chunk to help sashan@ out with some of the pf ioctl work
> he is doing.
> 
> he is looking at allocating config over multiple ioctls, and would like
> to be able to throw it away in situations like if the userland program
> creating the state goes away. with the current vnode and device special
> semantics, only the last close will call pfclose, which is a nice place
> to do cleanup. if a long running process has /dev/pf open, then he'll
> never be able to clean up.
> 
> cloning also turns the dev_t into a nice identifier to use to
> associate these allocations with, which makes the cleanup more robust.
> using something like the pid or curproc allows for userland to confuse
> pf too easily.
> 
> ok?

yes, please.

OK sashan



Re: pf.conf / scrub resulting in invalid checksum

2022-10-10 Thread Alexandr Nedvedicky
Hello,

On Mon, Oct 10, 2022 at 06:52:00AM +0200, Bjorn Ketelaars wrote:

> 
> (reply also send to tech@)
> 
> In 2011 henning@ removed fiddling with the ip checksum of normalised
> packets in sys/net/pf_norm.c (r1.131). Rationale was that the checksum
> is always recalculated in all output paths anyway. In 2016 procter@
> reintroduced checksum modification to preserve end-to-end checksums
> (r1.189 of sys/net/pf_norm.c). Although I'm not sure, it seems as if
> somewhere in that timeslot checksum recalculation of normalised packets
> was broken.
> 
> Issue got caught as net/mcast-proxy strictly adheres to RFC2236, which
> states that "When receiving packets, the checksum MUST be verified
> before processing a packet". After scrubbing a packet the checksum
> becomes invalid thus failing verification by net/mcast-proxy.
> 
> I found two workarounds:
> 1.) rip out checksum verification from net/mcast-proxy;
> 2.) don't scrub packets with, e.g., id-random and/or no-df set.
> 
> However, proposed solution is to fix this in pf. Diff below fixes the
> issue at hand.
> 
> Comments/OK?

diff reads good to me. change makes sense in my opinion.

OK sashan


> 
> 
> Index: sys/net/pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1140
> diff -u -p -r1.1140 pf.c
> --- sys/net/pf.c  3 Sep 2022 19:22:19 -   1.1140
> +++ sys/net/pf.c  10 Oct 2022 03:22:06 -
> @@ -164,7 +164,7 @@ void   pf_add_threshold(struct 
> pf_thres
>  int   pf_check_threshold(struct pf_threshold *);
>  int   pf_check_tcp_cksum(struct mbuf *, int, int,
>   sa_family_t);
> -static __inline void  pf_cksum_fixup(u_int16_t *, u_int16_t, u_int16_t,
> +__inline void pf_cksum_fixup(u_int16_t *, u_int16_t, 
> u_int16_t,
>   u_int8_t);
>  void  pf_cksum_fixup_a(u_int16_t *, const struct pf_addr *,
>   const struct pf_addr *, sa_family_t, u_int8_t);
> @@ -1937,7 +1937,7 @@ pf_addr_wrap_neq(struct pf_addr_wrap *aw
>   * Note: this serves also as a reduction step for at most one add (as the
>   * trailing mod 2^16 prevents further reductions by destroying carries).
>   */
> -static __inline void
> +__inline void
>  pf_cksum_fixup(u_int16_t *cksum, u_int16_t was, u_int16_t now,
>  u_int8_t proto)
>  {
> Index: sys/net/pf_norm.c
> ===
> RCS file: /cvs/src/sys/net/pf_norm.c,v
> retrieving revision 1.224
> diff -u -p -r1.224 pf_norm.c
> --- sys/net/pf_norm.c 22 Aug 2022 20:35:39 -  1.224
> +++ sys/net/pf_norm.c 10 Oct 2022 03:22:06 -
> @@ -1646,14 +1646,21 @@ pf_scrub(struct mbuf *m, u_int16_t flags
>  #ifdef INET6
>   struct ip6_hdr  *h6 = mtod(m, struct ip6_hdr *);
>  #endif   /* INET6 */
> + u_int16_told;
>  
>   /* Clear IP_DF if no-df was requested */
> - if (flags & PFSTATE_NODF && af == AF_INET && h->ip_off & htons(IP_DF))
> + if (flags & PFSTATE_NODF && af == AF_INET && h->ip_off & htons(IP_DF)) {
> + old = h->ip_off;
>   h->ip_off &= htons(~IP_DF);
> + pf_cksum_fixup(>ip_sum, old, h->ip_off, 0);
> + }
>  
>   /* Enforce a minimum ttl, may cause endless packet loops */
> - if (min_ttl && af == AF_INET && h->ip_ttl < min_ttl)
> + if (min_ttl && af == AF_INET && h->ip_ttl < min_ttl) {
> + old = h->ip_ttl;
>   h->ip_ttl = min_ttl;
> + pf_cksum_fixup(>ip_sum, old, h->ip_off, 0);
> + }
>  #ifdef INET6
>   if (min_ttl && af == AF_INET6 && h6->ip6_hlim < min_ttl)
>   h6->ip6_hlim = min_ttl;
> @@ -1661,8 +1668,11 @@ pf_scrub(struct mbuf *m, u_int16_t flags
>  
>   /* Enforce tos */
>   if (flags & PFSTATE_SETTOS) {
> - if (af == AF_INET)
> + if (af == AF_INET) {
> + old = *(u_int16_t *)h;
>   h->ip_tos = tos | (h->ip_tos & IPTOS_ECN_MASK);
> + pf_cksum_fixup(>ip_sum, old, *(u_int16_t *)h, 0);
> + }
>  #ifdef INET6
>   if (af == AF_INET6) {
>   /* drugs are unable to explain such idiocy */
> @@ -1674,6 +1684,9 @@ pf_scrub(struct mbuf *m, u_int16_t flags
>  
>   /* random-id, but not for fragments */
>   if (flags & PFSTATE_RANDOMID && af == AF_INET &&
> - !(h->ip_off & ~htons(IP_DF)))
> + !(h->ip_off & ~htons(IP_DF))) {
> + old = h->ip_id;
>   h->ip_id = htons(ip_randomid());
> + pf_cksum_fixup(>ip_sum, old, h->ip_id, 0);
> + }
>  }
> Index: sys/net/pfvar.h
> ===
> RCS file: /cvs/src/sys/net/pfvar.h,v
> retrieving revision 1.510
> diff -u -p -r1.510 pfvar.h
> --- sys/net/pfvar.h   3 Sep 2022 14:57:54 -   

yet another follow-up to pfr_add_tables()

2022-10-04 Thread Alexandr Nedvedicky
Hello,

diff below fixes my a tiny glitch I've introduced with
commit [1] back in May. Fortunately the impact is hardly
noticeable (I think).

The current code reads as follows:
1495 pfr_add_tables(struct pfr_table *tbl, int size, int *nadd, int flags)
1496 {

1507 for (i = 0; i < size; i++) {
1508 YIELD(flags & PFR_FLAG_USERIOCTL);
1509 if (COPYIN(tbl+i, _t, sizeof(key.pfrkt_t), 
flags))
1510 senderr(EFAULT);
1511 if (pfr_validate_table(_t, PFR_TFLAG_USRMASK,
1512 flags & PFR_FLAG_USERIOCTL))
1513 senderr(EINVAL);
1514 key.pfrkt_flags |= PFR_TFLAG_ACTIVE;
1515 p = pfr_create_ktable(_t, tzero, 0,
1516 (flags & PFR_FLAG_USERIOCTL? PR_WAITOK : PR_NOWAIT));
1517 if (p == NULL)
1518 senderr(ENOMEM);
1519 
1520 /*
1521  * Note: we also pre-allocate a root table here. We keep it
1522  * at ->pfrkt_root, which we must not forget about.
1523  */
1524 key.pfrkt_flags = 0;

1551 }
1552 
1553 /*
1554  * auxq contains freshly allocated tables with no dups.
1555  * also note there are no rulesets attached, because
1556  * the attach operation requires PF_LOCK().
1557  */
1558 NET_LOCK();
1559 PF_LOCK();
1560 SLIST_FOREACH_SAFE(n, , pfrkt_workq, w) {
1561 p = RB_FIND(pfr_ktablehead, _ktables, n);
1562 if (p == NULL) {
1563 SLIST_REMOVE(, n, pfr_ktable, pfrkt_workq);
1564 SLIST_INSERT_HEAD(, n, pfrkt_workq);
1565 xadd++;
1566 } else if (!(flags & PFR_FLAG_DUMMY) &&
1567 !(p->pfrkt_flags & PFR_TFLAG_ACTIVE)) {
1568 p->pfrkt_nflags = (p->pfrkt_flags &
1569 ~PFR_TFLAG_USRMASK) | key.pfrkt_flags;
1570 SLIST_INSERT_HEAD(, p, pfrkt_workq);
1571 }
1572 }

at line 1569 local variable 'key.pfrkt_flags' is always 0, because
it's set to 0 at line 1524. Thus the code has no effect. We should
be using 'n->pfrkt_flags' instead at line 1569.

It took me a while to figure out when else branch at line 1568 gets
executed. If I understand code right the branch handles kind of
edge case which deals with situation the table exists already but
transaction which inserts/creates table is not yet committed.
For example consider situation of pfctl which creates a table 'test'
via pfr_ina_define() (on behalf of transaction when it loads pf.conf).
The pfctl process gets terminated for whatever reason before transaction
is committed. In this case we might be left with inactive table hanging
in pf(4) driver. Such table is never reported by 'pfctl -sT' but is
still present in pfr_ktables tree.

There is pf_tbltrans.c test program which emulates a theoretical failure
of pfctl. It creates a table via transaction (like parse.y does) but
it never commits the transaction. This is what happens when we run
commands below on current pf(4):
netlock# ./pf_tbltrans test
netlock# pfctl -t test -T add 192.168.1.0/24
pfctl: Table does not exist
netlock# pfctl -FT
0 tables deleted.
netlock# pfctl -t test -T add 192.168.1.0/24
pfctl: Table does not exist
There 'pfctl -t test -T add 192.168.1.0/24' fails because it is not able
to add address to inactive (not committed) table 'test'. As soon as
we replay the same scenario on system with diff applied the
story looks as follows:
netlock# ./pf_tbltrans test 
   
netlock# pfctl -t test -T add 192.168.1.0/24
1/1 addresses added.
The reason why it works is that fixed line 1569 kicks in and
active flag gets set which makes table active ('visible') for
all other operations.

OK to commit?

thanks and
regards
sashan

[1] https://marc.info/?l=openbsd-cvs=16522243003=2

8<---8<-8<---8<-
diff --git a/sys/net/pf_table.c b/sys/net/pf_table.c
index f537aac2387..f6d5c355b1f 100644
--- a/sys/net/pf_table.c
+++ b/sys/net/pf_table.c
@@ -1566,7 +1566,7 @@ pfr_add_tables(struct pfr_table *tbl, int size, int 
*nadd, int flags)
} else if (!(flags & PFR_FLAG_DUMMY) &&
!(p->pfrkt_flags & PFR_TFLAG_ACTIVE)) {
p->pfrkt_nflags = (p->pfrkt_flags &
-   ~PFR_TFLAG_USRMASK) | key.pfrkt_flags;
+   ~PFR_TFLAG_USRMASK) | n->pfrkt_flags;
SLIST_INSERT_HEAD(, p, pfrkt_workq);
}
}
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int
main(int argc, const char *argv[])
{
int pf;
 

Re: make kernel build without INET6 again (pf_lb.c)

2022-08-30 Thread Alexandr Nedvedicky
surre, looks good to me.

OK sashan

On Tue, Aug 30, 2022 at 09:45:17PM +0200, Sebastian Benoit wrote:
> ok?
> 
> diff --git sys/net/pf_lb.c sys/net/pf_lb.c
> index 588115cbff7..905af42e463 100644
> --- sys/net/pf_lb.c
> +++ sys/net/pf_lb.c
> @@ -519,13 +519,18 @@ pf_map_addr(sa_family_t af, struct pf_rule *r, struct 
> pf_addr *saddr,
>* fall back to POOL_NONE if there is a single host
>* address in pool.
>*/
> - if ((af == AF_INET &&
> - rmask->addr32[0] == INADDR_BROADCAST) ||
> - (af == AF_INET6 &&
> - IN6_ARE_ADDR_EQUAL(>v6, ))) {
> + if (af == AF_INET &&
> + rmask->addr32[0] == INADDR_BROADCAST) {
>   pf_addrcpy(naddr, raddr, af);
>   break;
>   }
> +#ifdef INET6
> + if (af == AF_INET6 &&
> + IN6_ARE_ADDR_EQUAL(>v6, )) {
> + pf_addrcpy(naddr, raddr, af);
> + break;
> + }
> +#endif
>   } else if (pf_match_addr(0, raddr, rmask, >counter, af))
>   return (1);
>  
> 



Re: pf fragment lock

2022-08-22 Thread Alexandr Nedvedicky
Hello,

On Mon, Aug 22, 2022 at 08:45:29PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> Hrvoje managed to crash the kernel in pf fragment reassembly.
> 
> > r620-1# pfctl -e
> > pf enabled
> > r620-1# pfctl -f /etc/pf.conf
> > uvm_fault(0x824b9278, 0xb7, 0, 2) -> e
> > kernel: page fault trap, code=0
> > Stopped at  pf_free_fragment+0x77:  movq%rax,0xb8(%rcx)
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> >  350110  79921  00x13  02K tcpbench
> > *301306  98239  0 0x14000  0x2004  softnet
> >   55791   2358  0 0x14000  0x2003  softnet
> >  176238  13130  0 0x14000  0x2001  softnet
> >   66977  54316  0 0x14000  0x2005  systq
> >  165986  42679  0 0x14000 0x42000  softclock
> > pf_free_fragment(fd83a5022010) at pf_free_fragment+0x77
> > pf_create_fragment(800022d717ce) at pf_create_fragment+0xc8
> > pf_reassemble6(800022d71708,800022d71648,30,0,1,800022d717ce) at
> > pf_reassemble6+0x51
> > pf_normalize_ip6(800022d716c8,800022d717ce) at pf_normalize_ip6+0x8a
> > pf_test(18,1,80095048,800022d71978) at pf_test+0x30d
> > ip6_input_if(800022d71978,800022d71984,29,0,80095048) at
> > ip6_input_if+0x1ae
> > ipv6_input(80095048,fd80a3f1dc00) at ipv6_input+0x39
> > ether_input(80095048,fd80a3f1dc00) at ether_input+0x3b1
> > if_input_process(80095048,800022d71a68) at if_input_process+0x6f
> > ifiq_process(80099900) at ifiq_process+0x69
> > taskq_thread(80032200) at taskq_thread+0x11a
> > end trace frame: 0x0, count: 4
> > https://urldefense.com/v3/__https://www.openbsd.org/ddb.html__;!!ACWV5N9M2RV99hQ!Lp6PoTOkfwK6l_zUKCzgqp4LJWQKYlUZuOU7xnK4oGjI-tUtS1PjKwdJGLAXwJ8_jO2zCf0RZnVm5js3tfsTgQDDz5JAmCqmOQ$
> >describes the minimum info required in
> > bug reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb{4}>
> 
> It crashes here in pf_free_fragment()
>266  TAILQ_REMOVE(_fragqueue, frag, frag_next);
> 
> Putting a pf frag lock into pf_create_fragment() around
> pf_flush_fragments() does not look sufficient.  The pf_nfrents++
> also needs protection.  So I moved the lock around pf_reassemble().
> 
> ok?
> 

fix looks good to me.
thanks for taking care of it.

OK sashan



Re: store pf rules in a tree

2022-08-12 Thread Alexandr Nedvedicky
Hello Stefan,

On Wed, Aug 10, 2022 at 02:38:16PM +, Stefan Butz wrote:
> Hi everyone,
> 
> this mail includes a patch to store pf rules in a red-black tree.
> Currently they are stored in a linked list.
> My system configured with 16000 rules takes about 10 minutes
> to print them out using `pfctl -sr`.
> This patch decreases the time to 4 seconds.
> I was not able to measure a time increase for rule insertion.
> Inserting a lot of labels is still very slow.
> This has to be attacked separatly.
> 
> 

below is a diff which should make DIOCGETRULE fast enough. It introduces
a transaction (struct pf_trans).  it comes from my 'work-in-progress' tree.
I'd like to introduce transactions to move all memory allocations done on
behalf of ioctl(2) in pf(4) outside of PF_LOCK() scope.  Seeing your diff
I've realized we can use pf_trans also to speed up DIOCGETRULE.


diff below is based on my work-in-progress tree/branch. There might be some
rough edges, but it compiles and seems to work. if people will like the idea
I'll polish the change and commit it. It will also help me to make upcoming
diffs smaller.

At this point I'd like to know if idea presented in diff below makes
kind of sense.

there is `struct pf_trans` which keeps context for operations
which involve multiple ioctl(2) calls (such as ADDRULE, GETRULE,...)

ruleset.ticket is renamed to ruleset.version because ticket is know
kept in newly introduced pf_trans.

the workflow for process which wants to read/modify ruleset. In case
of DIOCGETRULES/DIOCGETRULE operations the workflow looks as follows:

transaction (pf_trans) is created by DIOCGETRULES

transaction gets reference to anchor/ruleset

we also remember current version of ruleset in transaction

we also rmemeber a pointer to the first rule in ruleset/anchor

we return transaction ticket to pfctl(8) process

then pfctl(8) issues DIOCGETRULE, it passes transaction ticket to ioctl(2)

we use ticket to look up a transaction created by DIOCGETRULES

we check if version kept in transaction matches the version at
ruleset.  If there is a mismatch ruleset got modified and we bail out.

if rule stored in transaction is NULL we return ENOENT
to indicate ruleset is either empty or there are no more rules

if version matches then we may safely proceed, the rule pointer
must be valid because matching version number warrants nothing
was modified.

we copy rule to buffer

and we store a next rule in transaction.

all transactions created by process are currently destroyed in pfclose()

thanks for thoughts/feedback.

regards
sashan

8<---8<---8<--8<
diff --git a/sbin/pfctl/pfctl.c b/sbin/pfctl/pfctl.c
index 9f14b955ca7..cf9def71c02 100644
--- a/sbin/pfctl/pfctl.c
+++ b/sbin/pfctl/pfctl.c
@@ -833,7 +833,7 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
 char *anchorname, int depth, int wildcard, long shownr)
 {
struct pfioc_rule pr;
-   u_int32_t nr, mnr, header = 0;
+   u_int32_t header = 0;
int len = strlen(path), ret = 0;
char *npath, *p;
 
@@ -884,24 +884,9 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
goto error;
}
 
-   if (shownr < 0) {
-   mnr = pr.nr;
-   nr = 0;
-   } else if (shownr < pr.nr) {
-   nr = shownr;
-   mnr = shownr + 1;
-   } else {
-   warnx("rule %ld not found", shownr);
-   ret = -1;
-   goto error;
-   }
-   for (; nr < mnr; ++nr) {
-   pr.nr = nr;
-   if (ioctl(dev, DIOCGETRULE, ) == -1) {
-   warn("DIOCGETRULE");
-   ret = -1;
-   goto error;
-   }
+   while (ioctl(dev, DIOCGETRULE, ) != -1) {
+   if ((shownr != -1) && (shownr != pr.nr))
+   continue;
 
/* anchor is the same for all rules in it */
if (pr.rule.anchor_wildcard == 0)
@@ -956,6 +941,13 @@ pfctl_show_rules(int dev, char *path, int opts, enum 
pfctl_show format,
case PFCTL_SHOW_NOTHING:
break;
}
+   errno = 0;
+   }
+
+   if ((errno != 0) && (errno != ENOENT)) {
+   warn("DIOCGETRULE");
+   ret = -1;
+   goto error;
}
 
/*
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index b540afc6fe2..6ef89d162b3 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -116,6 +116,11 @@ voidpf_qid_unref(u_int16_t);
 int pf_states_clr(struct pfioc_state_kill *);
 int pf_states_get(struct pfioc_states *);
 
+struct pf_trans

Re: ip6 routing header 0 offset

2022-08-12 Thread Alexandr Nedvedicky
Hello,

On Thu, Aug 11, 2022 at 09:42:54PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> The IPv6 routing header type 0 check should modify *offp only in
> case of an error, so that the genrated icmp6 packet has the correct
> pointer.  After successful return, *offp should not be modified.

makes sense to me.
> 
> ok?

OK sashan

> 
> bluhm
> 
> Index: netinet6/ip6_input.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_input.c,v
> retrieving revision 1.250
> diff -u -p -r1.250 ip6_input.c
> --- netinet6/ip6_input.c  6 Aug 2022 15:57:59 -   1.250
> +++ netinet6/ip6_input.c  11 Aug 2022 19:36:22 -
> @@ -695,21 +695,23 @@ ip6_check_rh0hdr(struct mbuf *m, int *of
>   do {
>   switch (proto) {
>   case IPPROTO_ROUTING:
> - *offp = off;
>   if (rh_cnt++) {
>   /* more than one rh header present */
> + *offp = off;
>   return (1);
>   }
>  
>   if (off + sizeof(rthdr) > lim) {
>   /* packet to short to make sense */
> + *offp = off;
>   return (1);
>   }
>  
>   m_copydata(m, off, sizeof(rthdr), );
>  
>   if (rthdr.ip6r_type == IPV6_RTHDR_TYPE_0) {
> - *offp += offsetof(struct ip6_rthdr, ip6r_type);
> + *offp = off +
> + offsetof(struct ip6_rthdr, ip6r_type);
>   return (1);
>   }
>  
> 



Re: ipv4 and ipv6 fragment refactoring

2022-08-12 Thread Alexandr Nedvedicky
Hello,

On Thu, Aug 11, 2022 at 11:59:55AM +0200, Alexander Bluhm wrote:
> Hi,
> 
> ip_fragment() and ip6_fragment() do nearly the same thing, but they
> are implemented differently.
> 
> The idea of this diff is to move things around.  Then only differences
> from the IP standards but not in coding style remain.  This allows
> to compare both functions easily.

don't mind at all if things will be more consistent.
> 
> In IPv4 assume that m_pkthdr is correct and take the length from
> the, like in IPv6.
> 

would it make sense to put asssert there at least for a while
just to discover any surprises? something like this:

> -ip_fragment(struct mbuf *m, struct mbuf_list *fml, struct ifnet *ifp,
> +ip_fragment(struct mbuf *m0, struct mbuf_list *fml, struct ifnet *ifp,
>  u_long mtu)
>  {
> - struct ip *ip, *mhip;
> - struct mbuf *m0;
> - int len, hlen, off;
> - int mhlen, firstlen;
> + struct mbuf *m;
> + struct ip *ip;
> + int firstlen, hlen, tlen, len, off;
>   int error;
>  
>   ml_init(fml);
> - ml_enqueue(fml, m);
> + ml_enqueue(fml, m0);
>  
> - ip = mtod(m, struct ip *);
> + ip = mtod(m0, struct ip *);
>   hlen = ip->ip_hl << 2;
> + tlen = m0->m_pkthdr.len;
KASSERT(tlen == ntohs(ip->ip_len));


diff looks good to me (with or without assert).

OK sashan



Re: store pf rules in a tree

2022-08-10 Thread Alexandr Nedvedicky
Hello,


On Wed, Aug 10, 2022 at 02:38:16PM +, Stefan Butz wrote:
> Hi everyone,
> 
> this mail includes a patch to store pf rules in a red-black tree.
> Currently they are stored in a linked list.
> My system configured with 16000 rules takes about 10 minutes
> to print them out using `pfctl -sr`.
> This patch decreases the time to 4 seconds.
> I was not able to measure a time increase for rule insertion.
> Inserting a lot of labels is still very slow.
> This has to be attacked separatly.

took me while to figure out why it can take sooo lng
to retrieve 16k rules from kernel. this is the answer

1480 case DIOCGETRULE: {
1496 if (pr->ticket != ruleset->rules.active.ticket) {
1497 error = EBUSY;
1498 PF_UNLOCK();
1499 NET_UNLOCK();
1500 goto fail;
1501 }
1502 rule = TAILQ_FIRST(ruleset->rules.active.ptr);
1503 while ((rule != NULL) && (rule->nr != pr->nr))
1504 rule = TAILQ_NEXT(rule, entries);
1505 if (rule == NULL) {

so yes, in order to retrieve the next rule in list we must
always start with the first (HEAD) rule and iterate to rule n+1.

so I understand a motivation why to solve it using  rb-tree.

I also (as todd) have some doubts about code in DIOCHANGERULE.
the current code is able to insert to desired location.
I don't understand how your new code does that:


1675 
1676 if (pcr->action == PF_CHANGE_ADD_HEAD)
1677 oldrule = RB_MIN(pf_ruletree,
1678 ruleset->rules.active.ptr);
1679 else if (pcr->action == PF_CHANGE_ADD_TAIL)
1680 oldrule = RB_MAX(pf_ruletree,
1681 ruleset->rules.active.ptr);
   
1682 else {
1683 oldrule = RB_MIN(pf_ruletree,
1684 ruleset->rules.active.ptr);
1685 while ((oldrule != NULL) && (oldrule->nr != 
pcr->nr))
1686 oldrule = RB_NEXT(pf_ruletree,
1687 ruleset->rules.active.ptr, oldrule);   
   
1688 if (oldrule == NULL) {
1689 if (newrule != NULL)
1690 pf_rm_rule(NULL, newrule); 
   
1691 error = EINVAL;
   
1692 PF_UNLOCK();
1693 NET_UNLOCK();
1694 goto fail; 
   
1695 }  
   
1696 } 
1697 
1698 if (pcr->action == PF_CHANGE_REMOVE) {
1699 pf_rm_rule(ruleset->rules.active.ptr, oldrule);
   
1700 ruleset->rules.active.rcount--;
1701 } else {
1702 RB_INSERT(pf_ruletree,
1703 ruleset->rules.active.ptr, newrule);
1704 ruleset->rules.active.rcount++;
1705 newrule->ruleset = ruleset;
1706 }
1707 

how does RB_INSERT() at line 1702 insert rule to
desired location. Consider the current rules look like:

[ 1, 2, 3, 4 ]

i'm going to insert rule x, which should follow existing rule nr 3

[ 1, 2, 3, x, 4]

current code just re-indexes/renumbers the rules after insertion

[ 1, 2, 3, 4, 5 ], where x gets assigned 4.

I don't see how does that happen with RB_INSERT().

I think what we really need to fix is the iteration to n-th retrieved
rule we currently have in DIOCGETRULE. I'm working on slightly large
change which updates pf(4) transactions. I'll try to cherry pick
some changes from my in-progress tree and factor out a smaller diff 
which will solve that DIOCGETRULE itch for you. please stay tuned.

thanks and
regards
sashan



Re: [External] : inpcb lookup ref counting

2022-08-08 Thread Alexandr Nedvedicky
Hello,

diff looks good to me as far as I can tell.

OK sashan@



Re: fix NAT round-robin and random

2022-08-02 Thread Alexandr Nedvedicky
Hello,

On Mon, Aug 01, 2022 at 06:37:58PM +0200, Hrvoje Popovski wrote:
> On 20.7.2022. 22:27, Alexandr Nedvedicky wrote:
> > Hello,
> > 
> > below is a final version of patch for NAT issue discussed at bugs@ [1].
> > Patch below is updated according to feedback I got from Chris, claudio@
> > and hrvoje@.
> > 
> > The summary of changes is as follows:
> > 
> > - prevent infinite loop when packet hits NAT rule as follows:
> > pass out on em0 from 172.16.0.0/16 to any nat-to { 49/27 }
> > the issue has been introduced by my earlier commit [2]. The earlier
> > change makes pf(4) to interpret 49/27 as single IP address 
> > (POOL_NONE)
> > this is wrong, because pool 49/27 actually contains 32 addresses.
> > 
> > - while investigating the issue I've realized 'random' pool should
> >   rather be using arc4_uniform() with upper limit derived from mask.
> >   also the random number should be turned to netorder.
> > 
> > - also while I was debugging my change I've noticed we should be using
> >   pf_poolmask() to obtain address as a combination of pool address
> >   and result of generator (round-robin all random).
> > 
> > OK to commit?
> > 
> > thanks and
> > regards
> > sashan
> > 
> > 
> > [1] https://marc.info/?t=16581336821=1=2
> > https://marc.info/?t=16573254651=1=2
> > https://marc.info/?l=openbsd-bugs=165817500514813=2
> > 
> > [2] https://marc.info/?l=openbsd-cvs=164500117319660=2
> 
> 
> Hi all,
> 
> I've tested this diff and from what I see NAT behaves as it should and
> it's changing ip addresses quite nicely
> 
> 

thank you Hrvoje for carrying independent test. I'll commit the diff
later today if there will be no objection.

thanks and
regards
sashan



Re: pf: DIOCXCOMMIT and copyin

2022-07-21 Thread Alexandr Nedvedicky
Hello,

On Thu, Jul 21, 2022 at 11:13:28AM +0200, Moritz Buhl wrote:
> Hi tech,
> 
> for the other two DIOCX ioctls syzkaller showed that it is possible
> to grab netlock while doing copyin.
> 
> The same problem should exist for DIOCXCOMMIT but syzkaller didn't
> find it yet.
> 
> In case anybody can reproduce the witness lock order reversals the
> syzkaller can produce, the diff below might address the problem.
> 
> mbuhl
> 

change looks good to me.

OK sashan



fix NAT round-robin and random

2022-07-20 Thread Alexandr Nedvedicky
Hello,

below is a final version of patch for NAT issue discussed at bugs@ [1].
Patch below is updated according to feedback I got from Chris, claudio@
and hrvoje@.

The summary of changes is as follows:

- prevent infinite loop when packet hits NAT rule as follows:
pass out on em0 from 172.16.0.0/16 to any nat-to { 49/27 }
the issue has been introduced by my earlier commit [2]. The earlier
change makes pf(4) to interpret 49/27 as single IP address (POOL_NONE)
this is wrong, because pool 49/27 actually contains 32 addresses.

- while investigating the issue I've realized 'random' pool should
  rather be using arc4_uniform() with upper limit derived from mask.
  also the random number should be turned to netorder.

- also while I was debugging my change I've noticed we should be using
  pf_poolmask() to obtain address as a combination of pool address
  and result of generator (round-robin all random).

OK to commit?

thanks and
regards
sashan


[1] https://marc.info/?t=16581336821=1=2
https://marc.info/?t=16573254651=1=2
https://marc.info/?l=openbsd-bugs=165817500514813=2

[2] https://marc.info/?l=openbsd-cvs=164500117319660=2

8<---8<---8<--8<
diff --git a/sys/net/pf_lb.c b/sys/net/pf_lb.c
index d106073d372..dff2349cde7 100644
--- a/sys/net/pf_lb.c
+++ b/sys/net/pf_lb.c
@@ -344,6 +344,17 @@ pf_map_addr_sticky(sa_family_t af, struct pf_rule *r, 
struct pf_addr *saddr,
return (0);
 }
 
+uint32_t
+pf_rand_addr(uint32_t mask)
+{
+   uint32_t addr;
+
+   mask = ~ntohl(mask);
+   addr = arc4random_uniform(mask + 1);
+
+   return (htonl(addr));
+}
+
 int
 pf_map_addr(sa_family_t af, struct pf_rule *r, struct pf_addr *saddr,
 struct pf_addr *naddr, struct pf_addr *init_addr, struct pf_src_node **sns,
@@ -428,24 +439,29 @@ pf_map_addr(sa_family_t af, struct pf_rule *r, struct 
pf_addr *saddr,
} else if (init_addr != NULL && PF_AZERO(init_addr, af)) {
switch (af) {
case AF_INET:
-   rpool->counter.addr32[0] = arc4random();
+   rpool->counter.addr32[0] = pf_rand_addr(
+   rmask->addr32[0]);
break;
 #ifdef INET6
case AF_INET6:
if (rmask->addr32[3] != 0x)
-   rpool->counter.addr32[3] = arc4random();
+   rpool->counter.addr32[3] = pf_rand_addr(
+   rmask->addr32[3]);
else
break;
if (rmask->addr32[2] != 0x)
-   rpool->counter.addr32[2] = arc4random();
+   rpool->counter.addr32[2] = pf_rand_addr(
+   rmask->addr32[2]);
else
break;
if (rmask->addr32[1] != 0x)
-   rpool->counter.addr32[1] = arc4random();
+   rpool->counter.addr32[1] = pf_rand_addr(
+   rmask->addr32[1]);
else
break;
if (rmask->addr32[0] != 0x)
-   rpool->counter.addr32[0] = arc4random();
+   rpool->counter.addr32[0] = pf_rand_addr(
+   rmask->addr32[0]);
break;
 #endif /* INET6 */
default:
@@ -500,11 +516,16 @@ pf_map_addr(sa_family_t af, struct pf_rule *r, struct 
pf_addr *saddr,
}
} else if (PF_AZERO(>counter, af)) {
/*
-* fall back to POOL_NONE if there are no addresses in
-* pool
+* fall back to POOL_NONE if there is a single host
+* address in pool.
 */
-   pf_addrcpy(naddr, raddr, af);
-   break;
+   if ((af == AF_INET &&
+   rmask->addr32[0] == INADDR_BROADCAST) ||
+   (af == AF_INET6 &&
+   IN6_ARE_ADDR_EQUAL(>v6, ))) {
+   pf_addrcpy(naddr, raddr, af);
+   break;
+   }
} else if (pf_match_addr(0, raddr, rmask, >counter, af))
return (1);
 
@@ -532,9 +553,9 @@ pf_map_addr(sa_family_t af, struct pf_rule 

Re: pf: pool for pf_anchor

2022-07-20 Thread Alexandr Nedvedicky
Hello,

On Tue, Jul 19, 2022 at 10:58:08PM +0200, Alexander Bluhm wrote:
> On Tue, Jul 19, 2022 at 07:18:54PM +0200, Moritz Buhl wrote:
> > A solution would be to move the allocation of the pf_anchor struct
> > into a pool.  One final question would be regarding the size of the
> > hiwat or hardlimit.  Any suggestions?
> 
> 10 seems way to low.  We want to limit resource abuse.  But regular
> users should never hit this.  Especially existing configurations
> should still work after upgrade.
> 
> Perhaps 512 anchors.  Noone shoul ever need more than 512 anchors :-)

512 seems reasonable. All other options which come to my mind
require far more complex change. for example we might consider
to tight up those uncommitted  transactions to process. Then we
can release all uncommitted data in pfcloe() for such process.

> 
> > Any other comments?


> 
> Please do not use PR_WAITOK in pool_init() unless you know what you
> are doing.  I think it should work, but who knows.  The other pf
> pools have 0 flags, just use that.
> 
> Comment says rs_malloc is needed to compile pfctl.  Have you tested
> that?
> 
> Replacing M_WAITOK|M_CANFAIL|M_ZERO with PR_NOWAIT|PR_ZERO looks
> wrong.  PR_WAITOK|PR_LIMITFAIL|PR_ZERO should do something equivalent.
> 
> I am not sure if pf_find_anchor() should fail if the limit is hit.
> It uses only a temporary buffer.  All calls to pf_find_ruleset()
> should be checked whether permanent failure, after the limit has
> been reached, is what we want.  Or we keep the malloc there?
> 

this seems to be a good point. I think we should keep malloc() there.


thanks and
regards
sashan



Re: ip6 hbhchcheck mbuf pointer

2022-06-29 Thread Alexandr Nedvedicky
Hello,

On Wed, Jun 29, 2022 at 02:34:02PM +0200, Alexander Bluhm wrote:
> On Wed, Jun 29, 2022 at 12:42:13AM +0200, Alexandr Nedvedicky wrote:
> > for example here in current ip6_input_if() we may leak memory
> > on error:
> 
> Curently we do not leak memory.  We free the mbuf in the callee,
> and check the return code in the caller.

yes I agree. I've mis-read icmp6_error() earlier.

> 
> > another option would be to revisit all places where we call ip6_hbhchcheck()
> > and adjust caller accordingly.
> 
> I added an comment that described the error handling before commit.
> 
> > In my opinion all places where we
> > making ip6_hbhchcheck() to also free packet on error should be the
> > right thing to do.
> 
> Next diff passes the pointer further down and m_freemp(mp) sets it
> to NULL.  Also sort the parameters so that the mbuf is always first.
> Note that icmp6_error() frees the mbuf, but does not reset the
> pointer.  Fixing that would touch much more code.
> 
> ok?
> 

diff looks good to me. thanks.

OK sashan



Re: ip6 hbhchcheck mbuf pointer

2022-06-28 Thread Alexandr Nedvedicky
Hello,

I like the idea. I would just consider to make ip6_hbhchcheck()
to always free mp on error and set NULL as return value.

for example here in current ip6_input_if() we may leak memory
on error:

 308 int
 309 ip6_input_if(struct mbuf **mp, int *offp, int nxt, int af, struct ifnet 
*ifp)
 310 {

 423 #ifdef MROUTING
 424 if (ip6_mforwarding && ip6_mrouter[ifp->if_rdomain]) {
 425 int error;
 426 
 427 nxt = ip6_hbhchcheck(, offp, );
 428 if (nxt == IPPROTO_DONE)
 429 goto out;
 430 

 575 return IPPROTO_DONE;
 576  bad:
 577 nxt = IPPROTO_DONE;
 578 m_freemp(mp);
 579  out:
 580 rtfree(rt);
 581 return nxt;
 582 }

another option would be to revisit all places where we call ip6_hbhchcheck()
and adjust caller accordingly. In my opinion all places where we
making ip6_hbhchcheck() to also free packet on error should be the
right thing to do.

other than that diff reads OK to me.

thanks and
regards
sashan


On Tue, Jun 28, 2022 at 11:50:03PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> Deep in ip6_hbhchcheck() may modify the mbuf pointer.  Instead of
> having dangling pointer that we must not use, pass pointer to mbuf
> pointer and keep it consistent in the caller.
> 
> ok?
> 
> bluhm
> 
> Index: netinet6/ip6_input.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_input.c,v
> retrieving revision 1.246
> diff -u -p -r1.246 ip6_input.c
> --- netinet6/ip6_input.c  28 Jun 2022 08:24:29 -  1.246
> +++ netinet6/ip6_input.c  28 Jun 2022 10:51:43 -
> @@ -122,7 +122,7 @@ uint8_t ip6_soiikey[IP6_SOIIKEY_LEN];
>  int ip6_ours(struct mbuf **, int *, int, int);
>  int ip6_local(struct mbuf **, int *, int, int);
>  int ip6_check_rh0hdr(struct mbuf *, int *);
> -int ip6_hbhchcheck(struct mbuf *, int *, int *);
> +int ip6_hbhchcheck(struct mbuf **, int *, int *);
>  int ip6_hopopts_input(u_int32_t *, u_int32_t *, struct mbuf **, int *);
>  struct mbuf *ip6_pullexthdr(struct mbuf *, size_t, int);
>  int ip6_sysctl_soiikey(void *, size_t *, void *, size_t);
> @@ -424,7 +424,7 @@ ip6_input_if(struct mbuf **mp, int *offp
>   if (ip6_mforwarding && ip6_mrouter[ifp->if_rdomain]) {
>   int error;
>  
> - nxt = ip6_hbhchcheck(m, offp, );
> + nxt = ip6_hbhchcheck(, offp, );
>   if (nxt == IPPROTO_DONE)
>   goto out;
>  
> @@ -544,7 +544,7 @@ ip6_input_if(struct mbuf **mp, int *offp
>   goto bad;
>   }
>  
> - nxt = ip6_hbhchcheck(m, offp, );
> + nxt = ip6_hbhchcheck(, offp, );
>   if (nxt == IPPROTO_DONE)
>   goto out;
>  
> @@ -586,7 +586,7 @@ ip6_local(struct mbuf **mp, int *offp, i
>  {
>   NET_ASSERT_WLOCKED();
>  
> - nxt = ip6_hbhchcheck(*mp, offp, NULL);
> + nxt = ip6_hbhchcheck(mp, offp, NULL);
>   if (nxt == IPPROTO_DONE)
>   return IPPROTO_DONE;
>  
> @@ -597,13 +597,13 @@ ip6_local(struct mbuf **mp, int *offp, i
>  }
>  
>  int
> -ip6_hbhchcheck(struct mbuf *m, int *offp, int *oursp)
> +ip6_hbhchcheck(struct mbuf **mp, int *offp, int *oursp)
>  {
>   struct ip6_hdr *ip6;
>   u_int32_t plen, rtalert = ~0;
>   int nxt;
>  
> - ip6 = mtod(m, struct ip6_hdr *);
> + ip6 = mtod(*mp, struct ip6_hdr *);
>  
>   /*
>* Process Hop-by-Hop options header if it's contained.
> @@ -615,12 +615,11 @@ ip6_hbhchcheck(struct mbuf *m, int *offp
>   if (ip6->ip6_nxt == IPPROTO_HOPOPTS) {
>   struct ip6_hbh *hbh;
>  
> - if (ip6_hopopts_input(, , , offp)) {
> + if (ip6_hopopts_input(, , mp, offp))
>   goto bad;   /* m have already been freed */
> - }
>  
>   /* adjust pointer */
> - ip6 = mtod(m, struct ip6_hdr *);
> + ip6 = mtod(*mp, struct ip6_hdr *);
>  
>   /*
>* if the payload length field is 0 and the next header field
> @@ -634,13 +633,13 @@ ip6_hbhchcheck(struct mbuf *m, int *offp
>* (non-zero) payload length to the variable plen.
>*/
>   ip6stat_inc(ip6s_badoptions);
> - icmp6_error(m, ICMP6_PARAM_PROB,
> + icmp6_error(*mp, ICMP6_PARAM_PROB,
>   ICMP6_PARAMPROB_HEADER,
>   (caddr_t)>ip6_plen - (caddr_t)ip6);
>   goto bad;
>   }
> - IP6_EXTHDR_GET(hbh, struct ip6_hbh *, m, sizeof(struct ip6_hdr),
> - sizeof(struct ip6_hbh));
> + IP6_EXTHDR_GET(hbh, struct ip6_hbh *, *mp,
> + sizeof(struct ip6_hdr), sizeof(struct ip6_hbh));
>   if (hbh == NULL) {

Re: ip6 hbhchcheck next protocol

2022-06-28 Thread Alexandr Nedvedicky
Hello,

On Tue, Jun 28, 2022 at 02:39:24AM +0200, Alexander Bluhm wrote:
> Hi,
> 
> The ip6_hbhchcheck() function never reads the nxtp parameter, it
> only sets its value.  It is more obvious if we return the next
> protocol and return IPPROTO_DONE to signal error.  All IP protocol
> functions do that.
> 
> ok?
> 

fine with me. diff looks good to me.

OK sashan



pfctl reports back a table as being always created/added

2022-06-14 Thread Alexandr Nedvedicky
Hello,

Jason Mc Intyre (jmc@) reported a bug earlier today reaching me
by email off-list. Let me quote from Jason's email:

i already have a pf table, adding an address tells me i have created a
table. even though the table already existed:

# pfctl -tbrutes -Tshow | wc
89  89  501
# pfctl -tbrutes -Tadd 1.1.1.1
1 table created.
1/1 addresses added.

The bug has been introduced by my commit to pf_table.c:
$OpenBSD: pf_ioctl.c,v 1.381 2022/05/10 23:12:25 sashan Exp $

pfr_table_add() currently always increases 'xadd' counter while it
should increase 'xadd' if and only if we are going to create a table.
diff below fixes that.

OK to commit?


thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_table.c b/sys/net/pf_table.c
index f261baef963..6c47e11f604 100644
--- a/sys/net/pf_table.c
+++ b/sys/net/pf_table.c
@@ -1562,13 +1562,13 @@ pfr_add_tables(struct pfr_table *tbl, int size, int 
*nadd, int flags)
if (p == NULL) {
SLIST_REMOVE(, n, pfr_ktable, pfrkt_workq);
SLIST_INSERT_HEAD(, n, pfrkt_workq);
+   xadd++;
} else if (!(flags & PFR_FLAG_DUMMY) &&
!(p->pfrkt_flags & PFR_TFLAG_ACTIVE)) {
p->pfrkt_nflags = (p->pfrkt_flags &
~PFR_TFLAG_USRMASK) | key.pfrkt_flags;
SLIST_INSERT_HEAD(, p, pfrkt_workq);
}
-   xadd++;
}
 
if (!(flags & PFR_FLAG_DUMMY)) {



Re: unlock pf_purge

2022-06-07 Thread Alexandr Nedvedicky
Hello,

I like this change and I think this should go in as-is.
This is OK sashan@

I also think we should revisit a pf.conf(5) manpage where
'interval' (PFTM_INTERVAL) is described. It currently
reads as follows:

 set timeout variable value
 frag   Seconds before an unassembled fragment is expired (60
by default).
 interval   Interval between purging expired states and fragments
(10 seconds by default).
 src.track  Length of time to retain a source tracking entry after
the last state expires (0 by default, which means
there is no global limit.  The value is defined by the
rule which creates the state.).

the pf(4) runs expiration timer for states every second regardless
how interval value is set. pf(4) uses/abuses interval value (PFTM_INTERVAL)
for state expiration to determine a fraction of states to be processed
by 1-sec timer.

I prefer to tune manpage in separate commit.

thanks and
OK sashan

On Tue, Jun 07, 2022 at 04:58:24PM +1000, David Gwynne wrote:
> the main change here is to move pf_purge out from under the kernel lock.
> 
> another part of the change is to limit the amount of work the state
> purging does to avoid hogging a cpu too much, and to also avoid holding
> NET_LOCK for too long.
> 
> the main part is acheived by using systqmp to run the purge tasks
> instead of systq. from what i can tell, all the data structures we're
> walking over are protected by pf locks and/or the net lock. the pf
> state list in particular was recently reworked so iteration over
> the list can be done without interfering with insertions. this means we
> can scan the state list to collect expired states without affecting
> packet processing.
> 
> however, if you have a lot of states (like me) you can still spend a
> lot of time scanning. i seem to sit at around 1 to 1.2 million states,
> and i run with the default scan interval of 10 seconds. that means im
> scanning about 100 to 120k states every second, which just takes time.
> 
> doing the scan while holding the kernel lock means most other stuff
> cannot run at the same time. worse, the timeout that schedules the pf
> purge work also seems to schedule something in the softclock thread. if
> the same timeout also wakes up a process in an unlocked syscall, a
> significant amount of time in the system is spent spinning for no
> reason. significant meaning dozens of milliseconds, which is actually
> noticable if you're working interactively on the box.
> 
> before this change pf purges often took 10 to 50ms. the softclock
> thread runs next to it often took a similar amount of time, presumably
> because they ended up spinning waiting for each other. after this
> change the pf_purges are more like 6 to 12ms, and dont block softclock.
> most of the variability in the runs now seems to come from contention on
> the net lock.
> 
> the state purge task now checks the SHOULDYIELD flag, so if the
> task is taking too long it will return early and requeue itself,
> allowing the taskq machinery to yield and let other threads run.
> 
> state purging is now also limited to removing 64 states per task run to
> avoid holding locks that would block packet processing for too long. if
> there's still states to scan in the interval after these 64 packets are
> collected, the task reschedules so it can keep scanning from where it
> left off.
> 
> the other purge tasks for source nodes, rules, and fragments have been
> moved to their own timeout/task pair to simplify the time accounting.
> 
> the box feels a lot more responsive with this change. i'm still kicking
> it around, but i wanted to get it out early.
> 
> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1132
> diff -u -p -r1.1132 pf.c
> --- pf.c  23 May 2022 11:17:35 -  1.1132
> +++ pf.c  7 Jun 2022 06:49:51 -
> @@ -120,10 +120,6 @@ u_charpf_tcp_secret[16];
>  int   pf_tcp_secret_init;
>  int   pf_tcp_iss_off;
>  
> -int   pf_npurge;
> -struct task   pf_purge_task = TASK_INITIALIZER(pf_purge, _npurge);
> -struct timeoutpf_purge_to = TIMEOUT_INITIALIZER(pf_purge_timeout, 
> NULL);
> -
>  enum pf_test_status {
>   PF_TEST_FAIL = -1,
>   PF_TEST_OK,
> @@ -1276,49 +1272,111 @@ pf_purge_expired_rules(void)
>   }
>  }
>  
> -void
> -pf_purge_timeout(void *unused)
> -{
> - /* XXX move to systqmp to avoid KERNEL_LOCK */
> - task_add(systq, _purge_task);
> -}
> +void  pf_purge_states(void *);
> +struct task   pf_purge_states_task =
> +  TASK_INITIALIZER(pf_purge_states, NULL);
> +
> +void  pf_purge_states_tick(void *);
> +struct timeoutpf_purge_states_to =
> +  TIMEOUT_INITIALIZER(pf_purge_states_tick, NULL);
> +
> 

potential memory leak in if_vinput()

2022-06-06 Thread Alexandr Nedvedicky
Hello,

I've spotted this glitch while hunting down use after-free
in 'veb' packet path. I believe the issue is rather hypothetical,
there is no evidence the deemed memory leak ever occurred.

Anyway I believe the if_vinput() should always consume packet
by either passing it further when IFXP_MONITOR flag is set
or just releasing it.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/if.c b/sys/net/if.c
index f354c9d8a6c..db181586123 100644
--- a/sys/net/if.c
+++ b/sys/net/if.c
@@ -869,6 +869,8 @@ if_vinput(struct ifnet *ifp, struct mbuf *m)
 
if (__predict_true(!ISSET(ifp->if_xflags, IFXF_MONITOR)))
(*ifp->if_input)(ifp, m);
+   else
+   m_freem(m);
 }
 
 void



Re: relayd panic

2022-06-05 Thread Alexandr Nedvedicky
Hello,

I'll commit one-liner diff on Tuesday morning (Jun 6th) Prague time without
explicit OK, unless there will be no objection.

regards
sashan


On Sun, Jun 05, 2022 at 09:44:45AM +0100, Stuart Henderson wrote:
> I don't know this code well enough to give a meaningful OK, but this
> should probably get committed.
> 
> 
> On 2022/06/01 09:16, Alexandr Nedvedicky wrote:
> > Hello,
> > 
> > 
> > > r420-1# rcctl -f start relayd
> > > relayd(ok)
> > > r420-1# uvm_fault(0xfd862f82f990, 0x0, 0, 1) -> e
> > > kernel: page fault trap, code=0
> > > Stopped at  pf_find_or_create_ruleset+0x1c: movb0(%rdi),%al
> > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > >  431388  19003  0 0x2  05  relayd
> > >  174608  32253 89   0x112  02  relayd
> > >  395415  12468  0 0x2  04  relayd
> > >  493579  11904  0 0x2  03  relayd
> > > *101082  14967 89   0x1100012  00K relayd
> > > pf_find_or_create_ruleset(0) at pf_find_or_create_ruleset+0x1c
> > > pfr_add_tables(832d7cca800,1,80eaf43c,1000) at
> > > pfr_add_tables+0x6ae
> > > 
> > > pfioctl(4900,c450443d,80eaf000,3,80002272e7f0) at 
> > > pfioctl+0x1d9f
> > > VOP_IOCTL(fd8551f82dd0,c450443d,80eaf000,3,fd862f7d60c0,800
> > > 02272e7f0) at VOP_IOCTL+0x5c
> > > vn_ioctl(fd855ecec1e8,c450443d,80eaf000,80002272e7f0) at
> > > vn_ioctl+0x75
> > > sys_ioctl(80002272e7f0,8000227d9980,8000227d99d0) at
> > > sys_ioctl+0x2c4
> > > syscall(8000227d9a40) at syscall+0x374
> > > Xsyscall() at Xsyscall+0x128
> > > end of kernel
> > 
> > it looks like we are dying here at line 239 due to NULL pointer 
> > deference:
> > 
> > 232 struct pf_ruleset *
> > 233 pf_find_or_create_ruleset(const char *path)
> > 234 {
> > 235 char*p, *aname, *r;
> > 236 struct pf_ruleset   *ruleset;
> > 237 struct pf_anchor*anchor;
> > 238 
> > 239 if (path[0] == 0)
> > 240 return (_main_ruleset);
> > 241 
> > 242 while (*path == '/')
> > 243 path++;
> > 244 
> > 
> > I've followed the same steps to reproduce the issue to check if
> > diff below resolves the issue. The bug has been introduced by
> > my recent change to pf_table.c [1] from May 10th:
> > 
> > Modified files:
> > sys/net: pf_ioctl.c pf_table.c 
> > 
> > Log message:
> > move memory allocations in pfr_add_tables() out of
> > NET_LOCK()/PF_LOCK() scope. bluhm@ helped a lot
> > to put this diff into shape.
> > 
> > besides using a regression test I've also did simple testing
> > using a 'load anchor':
> > 
> > netlock# cat /tmp/anchor.conf   
> >
> > load anchor "test" from "/tmp/pf.conf"
> > netlock#
> > netlock# cat /tmp/pf.conf   
> >
> > table  { 192.168.1.1 }
> > pass from 
> > netlock#
> > netlock# pfctl -sA
> >   test
> > netlock# pfctl -a test -sT
> > try
> > netlock# pfctl -a test -t try -T show
> >192.168.1.1
> > 
> > OK to commit fix below?
> > 
> > thanks and
> > regards
> > sashan
> > 
> > [1] 
> > https://urldefense.com/v3/__https://marc.info/?l=openbsd-cvs=16522243003=2__;!!ACWV5N9M2RV99hQ!LsTJPPsMku6N_u9xzJu6Tj6XpZWyLzLWPmbWr-Z-p845Y8r6LH4Ul8PyX8EmqI6alhF0JqadpBBF4mn53v-rQdY$
> >  
> > 
> > 8<---8<---8<--8<
> > diff --git a/sys/net/pf_table.c b/sys/net/pf_table.c
> > index 8315ea5dd3a..dfc49de5efe 100644
> > --- a/sys/net/pf_table.c
> > +++ b/sys/net/pf_table.c
> > @@ -1628,8 +1628,7 @@ pfr_add_tables(struct pfr_table *tbl, int size, int 
> > *nadd, int flags)
> > if (r != NULL)
> > continue;
> >  
> > -   q->pfrkt_rs = pf_find_or_create_ruleset(
> > -   q->pfrkt_root->pfrkt_anchor);
> > +   q->pfrkt_rs = 
> > pf_find_or_create_ruleset(q->pfrkt_anchor);
> > /*
> >  * root tables are attached to main ruleset,
> >  * because ->pfrkt_anchor[0] == '\0'
> > 



Re: pf_state_export panic with NET_TASKQ=6 and stuff ....

2022-05-27 Thread Alexandr Nedvedicky
Hello,


On Fri, May 27, 2022 at 10:33:06AM +0200, Hrvoje Popovski wrote:
> Hi all,
> 
> I'm running firewall in production with NET_TASKQ=6 with claudio@ "use
> timeout for rttimer" and bluhm@ "kernel lock in arp" diffs.
> After week or so of running smoothly I've got panic.

thank you for being brave enough to run those bits in production.



> bcbnfw1# uvm_fault(0x823c6ac0, 0x10, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at  pf_state_export+0x4e:   movq0x10(%rax),%rcx

according to registers below rax is 0, we die because
of NULL pointer dereference.

> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *414231  37466  0 0x14000  0x2003  softnet
>  180795  96693  0 0x14000  0x2002  softnet
>   39487  54182  0 0x14000  0x2000  softnet
>  221352  95757  0 0x14000  0x2004  softnet
>  252845  32137  0 0x14000  0x2001  softnet
>  294301  63695  0 0x14000  0x2005  softnet
> pf_state_export(fd80611313c8,fd8877492ac0) at pf_state_export+0x4e
> pfsync_sendout() at pfsync_sendout+0x5e4
> pfsync_update_state(fd887df852b8) at pfsync_update_state+0x15b
> pf_test(2,1,80d48000,800020b23a08) at pf_test+0xd53
> ip_input_if(800020b23a08,800020b23a14,4,0,80d48000) at 
> ip_input_if+0xcd
> ipv4_input(80d48000,fd80774a4000) at ipv4_input+0x39
> ether_input(80d48000,fd80774a4000) at ether_input+0x3ad
> carp_input(80d64000,fd80774a4000,5e000115) at carp_input+0x196
> ether_input(80d64000,fd80774a4000) at ether_input+0x1d9
> vlan_input(80b9f000,fd80774a4000,800020b23c3c) at 
> vlan_input+0x23d
> ether_input(80b9f000,fd80774a4000) at ether_input+0x85
> if_input_process(80493048,800020b23cd8) at if_input_process+0x6f
> ifiq_process(80491b00) at ifiq_process+0x69
> taskq_thread(80036500) at taskq_thread+0x11a
> end trace frame: 0x0, count: 1
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{3}>
> 

according to call stack we die somewhere here:

1192
1193memset(sp, 0, sizeof(struct pfsync_state));
1194
1195/* copy from state key */
1196sp->key[PF_SK_WIRE].addr[0] = st->key[PF_SK_WIRE]->addr[0];
1197sp->key[PF_SK_WIRE].addr[1] = st->key[PF_SK_WIRE]->addr[1];
1198sp->key[PF_SK_WIRE].port[0] = st->key[PF_SK_WIRE]->port[0];
1199sp->key[PF_SK_WIRE].port[1] = st->key[PF_SK_WIRE]->port[1];
1200sp->key[PF_SK_WIRE].rdomain = 
htons(st->key[PF_SK_WIRE]->rdomain);
1201sp->key[PF_SK_WIRE].af = st->key[PF_SK_WIRE]->af;

looks like state key bound to st might be gone (st->key[] == NULL).
I'll take closer look later today.

thanks and
regards
sashan



Re: [External] : pf: remove unused include files

2022-05-18 Thread Alexandr Nedvedicky
Hello,

On Tue, May 17, 2022 at 06:40:12PM +, Miod Vallat wrote:
> sys/net/pf.c r1.968 added a call to m_print() #ifdef DDB in a
> troublesome situation.
> 
> Once the root cause of the problem was fixed, the DDB-specific code path
> was removed in r1.970, but the added includes were kept, although
> nothing in pf.c depends on DDB anymore.

thanks for spotting this.

OK sashan@

> 
> Index: pf.c
> ===
> RCS file: /OpenBSD/src/sys/net/pf.c,v
> retrieving revision 1.1129
> diff -u -p -r1.1129 pf.c
> --- pf.c  5 May 2022 16:44:22 -   1.1129
> +++ pf.c  17 May 2022 18:38:34 -
> @@ -103,11 +103,6 @@
>  struct pfsync_deferral;
>  #endif /* NPFSYNC > 0 */
>  
> -#ifdef DDB
> -#include 
> -#include 
> -#endif
> -
>  /*
>   * Global variables
>   */
> 



Re: [External] : Re: move memory allocation in pfr_add_tables() outside of NET_LOCK()/PF_LOCK()

2022-05-09 Thread Alexandr Nedvedicky
Hello,

On Tue, May 10, 2022 at 12:18:15AM +0200, Alexander Bluhm wrote:
> On Mon, May 09, 2022 at 11:11:03PM +0200, Alexandr Nedvedicky wrote:
> > > ... and then we insert a destroyed p
> >
> > yes. you are right. new diff addresses that with change as follows:
> >
> > @@ -1542,9 +1542,8 @@ pfr_add_tables(struct pfr_table ...)
> > pfr_destroy_ktable(p, 0);
> > break;
> > }
> > +   SLIST_INSERT_HEAD(, p, pfrkt_workq);
> This inserts p each time you run over q list.
> 
> Should we do it like this?  It is similar to your solution at the
> other loop.
> 
> SLIST_FOREACH(q, , pfrkt_workq) {
> if (!pfr_ktable_compare(p, q)) {
> /*
>  * We need no lock here, because `p` is empty,
>  * there are no rules or shadow tables
>  * attached.
>  */
> pfr_destroy_ktable(p->pfrkt_root, 0);
> p->pfrkt_root = NULL;
> pfr_destroy_ktable(p, 0);
> break;
> }
> }
>   if (q != NULL)
>   continue;
> SLIST_INSERT_HEAD(, p, pfrkt_workq);

yes, this is exactly code I want.
> 
> 
> > > I compared the old and new code to see if it is equivalent.
> > > Before the condtion looked like this.
> >
> > very good point. I think this what needs to be done:
> >
> > @@ -1558,7 +1558,8 @@ pfr_add_tables(struct pfr_table *tbl, ...)
> > if (p == NULL) {
> > SLIST_REMOVE(, n, pfr_ktable, pfrkt_workq);
> > SLIST_INSERT_HEAD(, n, pfrkt_workq);
> > -   } else if (!(flags & PFR_FLAG_DUMMY)) {
> > +   } else if (!(flags & PFR_FLAG_DUMMY) &&
> I guess PFR_FLAG_DUMMY check is an optimization.

not really. I don't want to alter any flags on existing tables,
when running 'pfctl -n ...'. the '-n' sets PFR_FLAG_DUMMY.

> > +   !(p->pfrkt_flags & PFR_TFLAG_ACTIVE)) {
> Indent should be one tab to the left.
> > p->pfrkt_nflags = (p->pfrkt_flags &
> > ~PFR_TFLAG_USRMASK) | key.pfrkt_flags;
> > SLIST_INSERT_HEAD(, p, pfrkt_workq);
> 
> Old code had this to avoid duplicate entries.  Do we need it?
> SLIST_FOREACH(q, , pfrkt_workq)
> if (!pfr_ktable_compare(, q))
> goto _skip;

the duplicate entries are removed in the first loop, which copies
data in. So we don't need this check here.

> 
> > > This continue goes to the r list, but I think you want to continue p list.
> > > > }
> > > > }
> >
> > yes, exactly. we want to continue with outer loop if we break from inner
> > one. this is what I want to do:
> >
> > @@ -1617,9 +1618,12 @@ pfr_add_tables(struct pfr_table *tbl, ...)
> > p->pfrkt_root = r;
> > SLIST_INSERT_HEAD(, q,
> > pfrkt_workq);
> > -   continue;
> > +   break;
> > }
> > }
> > +   if (r != SLIST_END())
> Could you use if (r != NULL) ?  Noone uses SLIST_END macros.
> > +   continue;
> > +

sure. 

updated diff is below.

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index 8315b115474..1f036e1368f 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -2217,12 +2217,8 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
error = ENODEV;
goto fail;
}
-   NET_LOCK();
-   PF_LOCK();
error = pfr_add_tables(io->pfrio_buffer, io->pfrio_size,
>pfrio_nadd, io->pfrio_flags | PFR_FLAG_USERIOCTL);
-   PF_UNLOCK();
-   NET_UNL

Re: [External] : Re: pf.conf(5) clarify ICMP sloppy state handling

2022-05-09 Thread Alexandr Nedvedicky
Hello,

I'm sorry I was too fast with commit. I've just committed
what's been suggested by bluhm@:

@@ -2186,6 +2186,7 @@ It cannot be used with
 .Cm modulate state
 or
 .Cm synproxy state .
+With this option ICMP replies can create states.
 .It Ar timeout seconds
 Changes the
 .Ar timeout


> This is helpful, but because it's so surprising that "pass proto icmp"
> doesn't pass all icmp traffic, I think it would help to mention it where
> "proto icmp" is described too.
> 
> Also, the top of the text about "sloppy" just talks about the sloppy
> TCP connection tracker, I think perhaps it would be better to lead
> with something that suggests it has multiple functions for different
> protocols?

I don't object to any of your enhancements.

reads OK sashan

> 
> Index: man5/pf.conf.5
> ===
> RCS file: /cvs/src/share/man/man5/pf.conf.5,v
> retrieving revision 1.594
> diff -u -p -r1.594 pf.conf.5
> --- man5/pf.conf.59 May 2022 20:29:23 -   1.594
> +++ man5/pf.conf.59 May 2022 21:05:48 -
> @@ -594,6 +594,13 @@ or
>  .Pc
>  must match.
>  .Pp
> +ICMP responses are not permitted unless they either match an
> +existing request, or unless
> +.Cm no state
> +or
> +.Cm keep state (sloppy)
> +is specified.
> +.Pp
>  .It Cm label Ar string
>  Adds a label to the rule, which can be used to identify the rule.
>  For instance,
> @@ -2177,7 +2184,7 @@ States created by this rule are exported
>  .Xr pflow 4
>  interface.
>  .It Cm sloppy
> -Uses a sloppy TCP connection tracker that does not check sequence
> +For TCP, uses a sloppy connection tracker that does not check sequence
>  numbers at all, which makes insertion and ICMP teardown attacks way
>  easier.
>  This is intended to be used in situations where one does not see all
> @@ -2186,7 +2193,8 @@ It cannot be used with
>  .Cm modulate state
>  or
>  .Cm synproxy state .
> -With this option ICMP replies can create states.
> +For ICMP, this option allows states to be created from replies,
> +not just requests.
>  .It Ar timeout seconds
>  Changes the
>  .Ar timeout
> 



  1   2   3   4   5   6   7   >