Hello,

</snip>
> > to be honest I have no idea what could be causing problems on those two 
> > fairly
> > distinct machines. The strange thing is that pf_test() currently does not 
> > run in
> > parallel. I don't quite understand why reverting my earlier change helps 
> > here.
> 
> it could be two differents ways to trigger a bug somewhere else that
> your commit expose.
> 
> the panic doesn't trigger in the same way on both machines:
> - Olivier's machine seems to trigger it quickly (after some minutes)
> - mine relatively slowly (~ once a day)

    Olivier's machine acts as AP, so it forwards packets between interfaces.

    If I remember correctly your machine is laptop/workstation, which
    does not forward traffic. 

    the function, which we change back and forth here is
    pf_state_key_link_reverse(), which is being called from pf_find_state() 
here:

1085 
1086         if (sk == NULL) {
1087                 if ((sk = RB_FIND(pf_state_tree, &pf_statetbl,
1088                     (struct pf_state_key *)key)) == NULL)
1089                         return (PF_DROP);
1090                 if (pd->dir == PF_OUT && pkt_sk &&
1091                     pf_compare_state_keys(pkt_sk, sk, pd->kif, pd->dir) == 
0)
1092                         pf_state_key_link_reverse(sk, pkt_sk);
1093                 else if (pd->dir == PF_OUT && pd->m->m_pkthdr.pf.inp &&
1094                     !pd->m->m_pkthdr.pf.inp->inp_pf_sk && !sk->inp)
1095                         pf_state_key_link_inpcb(sk, 
pd->m->m_pkthdr.pf.inp);
1096         }
1097 

    the story in human words goes as follows:

        sk == NULL -> no matching state key was attached to packet, Thus we
        have to search state key in state tree using RB_FIND()

        if we could find state key for packet in table, then we will try
        to set up a 'shortcut', which can save us RB_FIND() later.

        1090 - 1092
        the shortcut can be set up for outbound packet only (pd->dir PF_OUT),
        which is also being forwarded (pkt_sk != NULL, indicates we are seeing
        the packet for the second time pkt_sk holds state key for inbound
        direction).  pf_compare_state_keys() is sanity check, it leaves a
        debug message on system console on failure.

        So if it is outbound forwarded packet, we've seen earlier, we
        set up a reverse link to save one RB_FIND() operation on next
        forwarded packet, which matches the same state.

        1093 - 1095
        creates similar shortcut for local bound packets. We put reference 
        to state key into PCB linked to socket. This will save us RB_FIND()
        operation for next local outbound, which matches the same state.


given the bug seems to be triggered/uncovered by pf_state_key_link_reverse()
is there any chance your laptop/workstation occasionally forwards packets?
like doing NAT for vmd/qemu virtual machine?

if it is not the case then the question is how does it come we run
pf_state_key_link_reverse()? which same as why pkt_sk is not NULL at line 1090.
        

> 
> I could try to run with your commit and see if I could trigger it more
> easily or found some elements influencing it. I could try with GENERIC
> for example to see if I still trigger the same assert() or if it is
> more like Olivier.

    I need to think of how to further debug the thing.

> 
> my LAN was several hosts with the same kernel and only this machine
> trigger the panic, so it shouldn't be strictly linked to the
> environment.
> 

thanks a lot for your help (and patience)

regards
sashan

Reply via email to