On 21 June 2016 at 14:00, Alexander Bluhm <alexander.bl...@gmx.net> wrote:
> On Tue, Jun 21, 2016 at 11:24:14AM +0200, Mike Belopuhov wrote:
>> So pf reused the port while some TCP segments were still in flight?
>
> No.  The old state was in FIN_WAIT_2 and the socket in TIME_WAIT.

Ah indeed, but hold on.  You're testing the sequence number
of the new state with an existing one which has seen some
traffic.. Are you sure this is correct?  The initial TCP
sequence number is modulated so it can be larger or smaller
that the one of the existing state.

In this code we're reusing the existing state key instead of
inserting a new one, so since you don't want to reuse it and
instead prefer to fail, you're limiting the decision based on
the comparison of the new and old TCP sequence numbers.  But
since they don't belong to the same connection, this is
inherently incorrect.

Unless I'm wrong, I have to retract my OK and ask you to fix
the sport bit instead.

> They were idling for 25 seconds.  Then a new state was created and
> Nat pf_get_sport() did choose the port of the old state.  The
> collision was on the PF_SK_STACK side, but pf_find_state_all() is
> called with PF_IN.  So this is not recognized.
>

If what I've said above is correct, perhaps we need a different
lookup function to take stack side keys into account and avoid
doing the search twice.

>> But this is key_attach stage not port allocation...  isn't that too
>> late? When we fail the state key attachment we drop the connection.
>
> I guess that the SYN-Retry fixes it.  But only one of 1000 connections
> fails so it is hard to debug.  My use case is quite strange, I do
> nat-to and divert-to in the same rule.
>
>> I'm ok to add this safeguard, but can't we apply additional logic
>> into the port allocation code to do a better job?
>
> In my case I ended up with inconsistent state and socket as they
> have different reuse policies.  This might be the case in other
> places too.  Fixes in other places might avoid that situation.  But
> here something goes wrong, so I think here something should be
> fixed.
>
>> Does this port get allocated via pf_get_sport or is it a static port
>> assignment that clashes with the port range NAT uses?
>
> It is not a static port, pf_get_sport() chooses it.
>
> Discussion with markus@ concluded that my diff is wrong for sloppy
> states.  There the sequence number is not updated in the old state.
> So here is a new diff that fixes this.
>
> ok?
>
> bluhm
>

Reply via email to