Re: Keep state connections randomly dropped

Peter Jeremy Sat, 03 Mar 2007 15:36:30 -0800

On 2007-Mar-03 02:55:01 -0800, Darren Reed <[EMAIL PROTECTED]> wrote:
>Peter Jeremy wrote:
>> On 2007-Feb-17 13:39:06 +1100, Peter Jeremy <[EMAIL PROTECTED]> wrote:
>> Part of the problem is that FreeBSD is re-using source ports more
>> frequently than IPfilter can handle, but since the TCP standard allows
>> port re-use after 2 minutes, why is IPfilter blocking the following?
>> In both cases, there is more than 3 minutes between the socket being
>> cleanly torn down and the reuse attempt.  "ipfstat -t" does not show
>> any state being retained after the teardown.
>>   
>
>Can you include the state table log output in your ipfilter logs?
>
>Either start ipmon with "-a" (get ipf, ipnat and state logs) or "-o IS'
>to get just ipf and state.


I've done some more investigating here and I suspect that the problem
here is just net.inet.ipf.fr_tcpclosewait - which defaults to 4 minutes.
Looking at 'ipfstat -tC', closed wait state does appear.  I will try
reducing net.inet.ipf.fr_tcpclosewait and see what happens.

>> The other problem occurs during data transfer with IPfilter
>> arbitrarily blocking the connection, even though "ipfstat -t" showed
>> that a state entry was present.  This is more serious because it is
>> effectively preventing large data transfers.  The following examples
>> show pairs of connections being broken at similar times - other
>> connections remain up (though they aren't showing any data transfer
>> at this time).
>> ...
>>   
>
>This is tricker.  In the first part of this log you have:
>
>08:10:00.610395 IP 192.168.234.64.59384 > 192.168.234.1.22: . ack 4044806860 
>win 33168 <nop,nop,timestamp 58681107 2589277>
>08:10:00.610702 IP 192.168.234.64.59384 > 192.168.234.1.22: F 
>3102426529:3102426529(0) ack 4044806860 win 33168 <nop,nop,timestamp 58681107 
>2589277>
>08:10:00.611106 IP 192.168.234.1.22 > 192.168.234.64.59384: . ack 3102426530 
>win 33303 <nop,nop,timestamp 2589278 58681107>
>Mar  3 08:10:00 fwall sshd[2147]: fatal: Write failed: Network is unreachable
>Mar  3 08:10:00 fwall ipmon[575]: 08:10:00.306431 fxp1 @20:4 b 
>192.168.234.1,22 -> 192.168.234.64,59384 PR tcp len 20 324 -AP OUT
>Mar  3 08:10:00 fwall ipmon[575]: 08:10:00.358704 2x fxp1 @20:4 b 
>192.168.234.1,22 -> 192.168.234.64,59384 PR tcp len 20 324 -AFP OUT
>
>which makes it look like some tail end packets are being blocked, not those in 
>the
>middle of the connection?

This was doing an scp of a 700MB file and it died after <200MB.  I've
also had interactive connections die for no reason.  My interpretation
of the scenario is that something is returning ENETUNREACH to sshd
which tears down the connection.  Looking through the kernel, the most
likely source of the ENETUNREACH appears to be the default result for
fr_check().  The presence of the "blocked packet" log entries seems to
support this.

My suspicion is that state entries are being temporarily lost (so the
state entry for a packet can't be found) and then re-appearing.
Consider the following (from a ssh session running "ipfstat -t" on the
firewall):

03:19:58.513388 IP 192.168.234.1.22 > 192.168.234.64.56431: P 
2264800:2264912(112) ack 3809 win 33304 <nop,nop,timestamp 71608430 127678894>
03:19:58.613552 IP 192.168.234.64.56431 > 192.168.234.1.22: . ack 2264912 win 
33304 <nop,nop,timestamp 127679905 71608430>
03:19:59.524166 IP 192.168.234.1.22 > 192.168.234.64.56431: P 
2264912:2265024(112) ack 3809 win 33304 <nop,nop,timestamp 71609441 127679905>
03:19:59.623443 IP 192.168.234.64.56431 > 192.168.234.1.22: . ack 2265024 win 
33304 <nop,nop,timestamp 127680915 71609441>
03:20:00.607533 IP 192.168.234.1.22 > 192.168.234.64.56431: P 
2265024:2265136(112) ack 3809 win 33304 <nop,nop,timestamp 71610525 127680915>
03:20:00.707482 IP 192.168.234.64.56431 > 192.168.234.1.22: . ack 2265136 win 
33304 <nop,nop,timestamp 127681999 71610525>
03:20:00.909104 IP 192.168.234.1.22 > 192.168.234.64.56431: P 
2265024:2265136(112) ack 3809 win 33304 <nop,nop,timestamp 71610827 127680915>
03:20:00.909352 IP 192.168.234.64.56431 > 192.168.234.1.22: . ack 2265136 win 
33304 <nop,nop,timestamp 127682200 71610827>
03:20:01.607698 IP 192.168.234.1.22 > 192.168.234.64.56431: P 
2265136:2265264(128) ack 3809 win 33304 <nop,nop,timestamp 71611525 127682200>
03:20:01.707430 IP 192.168.234.64.56431 > 192.168.234.1.22: . ack 2265264 win 
33304 <nop,nop,timestamp 127682999 71611525>
...
03:29:58.957583 IP 192.168.234.64.56431 > 192.168.234.1.22: . ack 2343840 win 
33304 <nop,nop,timestamp 128280257 72208960>
03:29:59.871050 IP 192.168.234.1.22 > 192.168.234.64.56431: P 
2343840:2343968(128) ack 3905 win 33304 <nop,nop,timestamp 72209973 128280257>
03:29:59.970570 IP 192.168.234.64.56431 > 192.168.234.1.22: . ack 2343968 win 
33304 <nop,nop,timestamp 128281270 72209973>
03:30:01.084305 IP 192.168.234.1.22 > 192.168.234.64.56431: FP 
2343968:2344096(128) ack 3905 win 33304 <nop,nop,timestamp 72211187 128281270>
03:30:01.087756 IP 192.168.234.64.56431 > 192.168.234.1.22: . ack 2344097 win 
33240 <nop,nop,timestamp 128282383 72211187>
03:30:01.087858 IP 192.168.234.64.56431 > 192.168.234.1.22: F 3905:3905(0) ack 
2344097 win 33240 <nop,nop,timestamp 128282384 72211187>
03:30:01.088398 IP 192.168.234.1.22 > 192.168.234.64.56431: . ack 3906 win 
33303 <nop,nop,timestamp 72211191 128282384>

Which elicited the following log messages:
Mar  3 21:16:52 fwall ipmon[8593]: 21:16:51.688032 STATE:NEW 
192.168.234.64,56431 -> 192.168.234.1,22 PR tcp
Mar  4 03:20:01 fwall ipmon[8593]: 03:20:00.707658 fxp1 @10:4 b 
192.168.234.64,56431 -> 192.168.234.1,22 PR tcp len 20 52 -A IN
Mar  4 03:30:00 fwall ipmon[8593]: 03:30:00.928975 fxp1 @20:4 b 
192.168.234.1,22 -> 192.168.234.64,56431 PR tcp len 20 180 -AP OUT
Mar  4 03:30:00 fwall sshd[10551]: fatal: Write failed: Network is unreachable
Mar  4 03:34:02 fwall ipmon[8593]: 03:34:01.389797 STATE:CLOSE 
192.168.234.64,56431 -> 192.168.234.1,22 PR tcp Forward: Pkts in 22032 Bytes in 
1152811 Pkts out 0 Bytes out 0 Backward: Pkts in 0 Bytes in 0 Pkts out 22095 
Bytes out 3947951

Note that there are no errors reported for the packets at
03:20:00.607533 or 03:20:00.909104 and no state messages logged around
that time.  This implies that the STATE:NEW and STATE:CLOSE entries
refer to the same state entry but it could not be found at
03:20:00.707658

>If you can capture a trace with "tcpdump -s 1536 -w foo ..." and send it my
>way, that'd be great.

My capture of the above session is just over 5MB and is missing the
session setup.  I'll try and see if I can find something shorter.

-- 
Peter Jeremy

pgpaPEYrAydWH.pgp
Description: PGP signature

Re: Keep state connections randomly dropped

Reply via email to