Hello,
</snip>
> > to be honest I have no idea what could be causing problems on those two
> > fairly
> > distinct machines. The strange thing is that pf_test() currently does not
> > run in
> > parallel. I don't quite understand why reverting my earlier change helps
> > here.
>
> it could be two differents ways to trigger a bug somewhere else that
> your commit expose.
>
> the panic doesn't trigger in the same way on both machines:
> - Olivier's machine seems to trigger it quickly (after some minutes)
> - mine relatively slowly (~ once a day)
Olivier's machine acts as AP, so it forwards packets between interfaces.
If I remember correctly your machine is laptop/workstation, which
does not forward traffic.
the function, which we change back and forth here is
pf_state_key_link_reverse(), which is being called from pf_find_state()
here:
1085
1086 if (sk == NULL) {
1087 if ((sk = RB_FIND(pf_state_tree, &pf_statetbl,
1088 (struct pf_state_key *)key)) == NULL)
1089 return (PF_DROP);
1090 if (pd->dir == PF_OUT && pkt_sk &&
1091 pf_compare_state_keys(pkt_sk, sk, pd->kif, pd->dir) ==
0)
1092 pf_state_key_link_reverse(sk, pkt_sk);
1093 else if (pd->dir == PF_OUT && pd->m->m_pkthdr.pf.inp &&
1094 !pd->m->m_pkthdr.pf.inp->inp_pf_sk && !sk->inp)
1095 pf_state_key_link_inpcb(sk,
pd->m->m_pkthdr.pf.inp);
1096 }
1097
the story in human words goes as follows:
sk == NULL -> no matching state key was attached to packet, Thus we
have to search state key in state tree using RB_FIND()
if we could find state key for packet in table, then we will try
to set up a 'shortcut', which can save us RB_FIND() later.
1090 - 1092
the shortcut can be set up for outbound packet only (pd->dir PF_OUT),
which is also being forwarded (pkt_sk != NULL, indicates we are seeing
the packet for the second time pkt_sk holds state key for inbound
direction). pf_compare_state_keys() is sanity check, it leaves a
debug message on system console on failure.
So if it is outbound forwarded packet, we've seen earlier, we
set up a reverse link to save one RB_FIND() operation on next
forwarded packet, which matches the same state.
1093 - 1095
creates similar shortcut for local bound packets. We put reference
to state key into PCB linked to socket. This will save us RB_FIND()
operation for next local outbound, which matches the same state.
given the bug seems to be triggered/uncovered by pf_state_key_link_reverse()
is there any chance your laptop/workstation occasionally forwards packets?
like doing NAT for vmd/qemu virtual machine?
if it is not the case then the question is how does it come we run
pf_state_key_link_reverse()? which same as why pkt_sk is not NULL at line 1090.
>
> I could try to run with your commit and see if I could trigger it more
> easily or found some elements influencing it. I could try with GENERIC
> for example to see if I still trigger the same assert() or if it is
> more like Olivier.
I need to think of how to further debug the thing.
>
> my LAN was several hosts with the same kernel and only this machine
> trigger the panic, so it shouldn't be strictly linked to the
> environment.
>
thanks a lot for your help (and patience)
regards
sashan