Hi,
During the weekend I started having issues where traffic going to and thru
OBSD PF is dropped. After doing some digging it seems that the packets are
dropped due to too fast timestamp incrementing. I have tested connecting to
the OBSD machine itself via SSH from a phone (using putty) and that
connection times out, I have also tried SSH to a server behind OBSD, which
OBSD NATs (rdr-to) and this also times out, last thing I tried was
connecting on port 8001 to this same server behind the OBSD and that
actually works, which seems a bit strange to me, if it's in fact this too
fast timestamp incrementing that is the problem.
So I tested commenting out the following line from my pf.conf and then
everything works ok, so I guess it has to be the timestamp causing issues:
match in all scrub (reassemble tcp)
I also tried using this line, so all packets would be shown on pflog0:
match in log(all) all scrub (reassemble tcp)
And then used:
tcpdump -n -e -ttt -i pflog0 port 22 and action block
To see if the packets were blocked, but did not see anything, so maybe I got
something wrong? Doesn't the "scrub match" log blocked packets? Why does
this seem to only affect some connections and not all (ssh vs port 8001).
I'm not sure what happened this last weekend, maybe the operator changed
their network to use more aggressive timestamping and that's the reason. But
that still doesn't explain why I can't catch the blocked packet in pflog0
and why the connection to port 8001 does go through while ssh fails.
Btw, is there a reason PF requires the timestamps to be so slow? I quickly
skimmed through rfc1323, and didn't see that it was mandated that it be as
slow as PF requires it to be. One part of the RFC says "the maximum
acceptable clock frequency is one tick every 59 ns", which would be ~16MHz,
so maybe some software are using much more agressive that is still under
16MHz and now causing this issue?
# uname -mrsv
OpenBSD 4.7 GENERIC#558 i386
Here is one capture from the external interface:
1 11:38:56.051463 AA.BB.CC.DD.47718 > OBSD.XX.YY.ZZ.8001: S
175975631:175975631(0) win 65280 <mss 1360,nop,nop,timestamp 1830794889
0,nop,wscale 1,nop,nop,sackOK> (DF)
2 11:38:56.052054 OBSD.XX.YY.ZZ.8001 > AA.BB.CC.DD.47718: S
853987916:853987916(0) ack 175975632 win 5792 <mss 1460,sackOK,timestamp
130863125 1830794889,nop,wscale 6> (DF)
3 11:38:56.211586 AA.BB.CC.DD.47718 > OBSD.XX.YY.ZZ.8001: . ack 1 win 65280
<nop,nop,timestamp 1833177264 130863125> (DF)
4 11:38:56.611873 AA.BB.CC.DD.47718 > OBSD.XX.YY.ZZ.8001: P 1:685(684) ack 1
win 65280 <nop,nop,timestamp 1833180264 130863125> (DF)
5 11:38:56.612342 OBSD.XX.YY.ZZ.8001 > AA.BB.CC.DD.47718: . ack 685 win 112
<nop,nop,timestamp 130863265 1833180264> (DF)
6 11:38:56.614866 OBSD.XX.YY.ZZ.8001 > AA.BB.CC.DD.47718: P 1:774(773) ack
685 win 112 <nop,nop,timestamp 130863265 1833180264> (DF)
7 11:38:56.615037 OBSD.XX.YY.ZZ.8001 > AA.BB.CC.DD.47718: F 774:774(0) ack
685 win 112 <nop,nop,timestamp 130863265 1833180264> (DF)
8 11:38:56.891873 AA.BB.CC.DD.47718 > OBSD.XX.YY.ZZ.8001: F 685:685(0) ack
775 win 64893 <nop,nop,timestamp 1833914514 130863265> (DF)
9 11:38:58.252041 OBSD.XX.YY.ZZ.8001 > AA.BB.CC.DD.47718: P 1:774(773) ack
685 win 112 <nop,nop,timestamp 130863675 1833180264> (DF)
10 11:38:58.371738 AA.BB.CC.DD.47718 > OBSD.XX.YY.ZZ.8001: . ack 775 win
64893 <nop,nop,timestamp 1835406889 130863265,nop,nop,sack 1 {1:774} > (DF)
...
And here's the first related entry in messages file (checked with grep
47718):
Jan 24 11:38:56 chim /bsd: pf: Timestamp failed 1
Jan 24 11:38:56 chim /bsd: pf: tsval: 1833914514 tsecr: 56382670 +ticks:
33307 idle: 0s 279ms
Jan 24 11:38:56 chim /bsd: pf: src->tsval: 1833180264 tsecr: 56382530
Jan 24 11:38:56 chim /bsd: pf: dst->tsval: 56382670 tsecr: 1833180264
tsval0: 56382530
Jan 24 11:38:56 chim /bsd: pf: TCP in wire: (0) AA.BB.CC.DD:47718
OBSD.XX.YY.ZZ:8001 stack: (0) AA.BB.CC.DD:47718 10.INT.SERVER:8001
[lo=175976316 high=175983484 win=65280 modulator=0 wscale=1] [lo=853988691
high=854118477 win=112 modulator=0 wscale=6] 4:7 @33 FA
Jan 24 11:38:58 chim /bsd: pf: Timestamp failed 1
Jan 24 11:38:58 chim /bsd: pf: tsval: 1835406889 tsecr: 56382670 +ticks:
34935 idle: 1s 759ms
Jan 24 11:38:58 chim /bsd: pf: src->tsval: 1833180264 tsecr: 56382530
Jan 24 11:38:58 chim /bsd: pf: dst->tsval: 56383080 tsecr: 1833180264
tsval0: 56382530
Jan 24 11:38:58 chim /bsd: pf: TCP in wire: (0) AA.BB.CC.DD:47718
OBSD.XX.YY.ZZ:8001 stack: (0) AA.BB.CC.DD:47718 10.INT.SERVER:8001
[lo=175976316 high=175983484 win=65280 modulator=0 wscale=1] [lo=853988691
high=854118477 win=112 modulator=0 wscale=6] 4:7 @33 A
So PF is only complaining on packet 8, thus the FIN seems to not go thru and
the client continues to sen packets. Timestamp difference between packets 4
and 8 is 734250 and 280ms has passed, giving us 2,6MHz. Between packet 1 and
3 the difference is 2382375 in 160ms, giving us 14,MHz. Finally between
packets 3 and 4 these numbers are 3000 in 400ms, giving us 7,5kHz. Did I
mess up some numbers? Or is the timestamp incrementation speed jumping a
lot? How come the first two are ok for PF, but the third is not?
While writing this e-mail I noticed that this is probably the reason why
port 8001 seems to work some times, it only starts to drop when it's already
sending FIN packet, thus breaking the connection when it's no longer needed.
This still doesn't really explain why it doesn't drop the packets 3 and 4
which already have too high frequency, but drops lower frequency at packet
8?
Is there any better solution than to turn off the scrub feature completely?
Unfortunately I was unable to check the logs if these timestamp failures
happened earlier (before last weekend) because there's a lot of these
entries and the default log rotation script had something like 30kiB
rotation interval, thus deleting older logs...
Any help would be appreciated.