On Sun, Dec 18, 2011 at 18:50 -0800, James A. Peltier wrote:
> ----- Original Message -----
> | Hi All,
> |
> | Today is our semester maintenance day and we've upgraded our backup
> | bridge firewall to the Dec. 15, 2011 snapshot available from
> | ftp.openbsd.org and I'm getting this odd error when I boot it up.
> | Oddly enough, this only happens when connected to the switch that
> | original one is connected to (we swap them out each semester).
> |
> | First, I use the upgrade method to go from snapshot to snapshot and
> | reboot
> | I run sysmerge to bring in the new configuration files from etc50.tgz
> | and xetc50.tgz ( I only have bsd* man* base* xbase* installed) and
> | reboot.
> |
> | So as you can see the standard running -current and I've done several
> | upgrades now.
> |
> | On my test switch (HP5304XL) it boots okay and I can reload the
> | firewall rules with no problem. When I connect it to my HP2910 where
> | the current firewall is running I cannot fully boot. If I press CTRL+C
> | during the starting network section it will continue to boot. If I
> | then run pfctl -e it states that PF is already enabled enabled but if
> | I run pfctl -Fr -f /etc/pf.conf I get the following.
> |
> | # uvm_fault(0xffffffff80d2ff40, 0x0, 0, 1) -> e
> | kernel: page fault trap, code=0
> | Stopped at pf_translate+0x154: cmpw %r13w,0(%rsi)
> | ddb{0}>
> |
> | keyboard is dead, no response at all from console. Any ideas?
>
> Okay, I've gotten some off list requests for more information, which
> I'm hoping I'll be able to get for those people, but I'm now outside
> of my maintenance window and will likely need to schedule another
> outage or figure out how to reproduce it again. The current bridge
> firewall running the following version does not exhibit the problem,
> but I'm not able to get a trace output at this time. Maybe it's
> still at least somewhat useful reference for updates that may have
> happened. ( Yeah right, from Aug 8th until now. Thousands of
> commits. ;) )
>
> OpenBSD 5.0 (GENERIC.MP) #57: Mon Aug 8 14:58:00 MDT 2011
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>
pf_translate+0x154 corresponds to the condition at pf.c:3765.
*pd->sport doesn't point to anything. pd->sport is set to point
to the extracted header in the pf_setup_pdesc. the problem is
that the header extraction happens based on the virtual_proto,
not proto, which can be different in the case of a fragment.
now, pf.c got it covered by the condition on line 3476 which
prevents pf_translate from running on fragments. the only
other invocation of pf_translate is in the if_pflog.c:407
where we don't check for fragments. therefore i think that
this is the problem.
the diff below should fix the problem. it also doesn't make
sense to do af translation if we didn't manage to get our
shit done in the pf_translate and in the subsequent block.
ok?
Index: if_pflog.c
===================================================================
RCS file: /cvs/src/sys/net/if_pflog.c,v
retrieving revision 1.45
diff -u -p -r1.45 if_pflog.c
--- if_pflog.c 21 Oct 2011 15:45:55 -0000 1.45
+++ if_pflog.c 19 Dec 2011 20:36:32 -0000
@@ -404,7 +404,8 @@ pflog_bpfcopy(const void *src_arg, void
if (pd.dport)
odport = *pd.dport;
- if ((pfloghdr->rewritten = pf_translate(&pd, &pfloghdr->saddr,
+ if (pd->virtual_proto != PF_VPROTO_FRAGMENT &&
+ (pfloghdr->rewritten = pf_translate(&pd, &pfloghdr->saddr,
pfloghdr->sport, &pfloghdr->daddr, pfloghdr->dport, 0,
pfloghdr->dir))) {
m_copyback(pd.m, pd.off, min(pd.m->m_len - pd.off, pd.hdrlen),
@@ -422,7 +423,7 @@ pflog_bpfcopy(const void *src_arg, void
pd.tot_len = min(pd.tot_len, len);
pd.tot_len -= pd.m->m_data - pd.m->m_pktdat;
- if (afto)
+ if (pfloghdr->rewritten && afto)
pf_translate_af(&pd);
mlen = min(pd.m->m_pkthdr.len, len);