Re: PF NAT and Oracle/Linux mystery

2003-01-22 Thread Daniel Hartmeier
On Sat, Jan 18, 2003 at 01:57:17PM +, Steve Schmitz wrote:

 If you consider gigabit/copper a fast network and can suggest 
 experiments/meassurements, I'll be happy to conduct them.

TCP window scaling support has been commited to -current (pf.c 1.306).
If you have a spare box to install -current on, you could give it a try,
pfctl -vss now additionally prints 'wscale n' for TCP connections using
window scaling.

Daniel




Re: PF NAT and Oracle/Linux mystery

2003-01-18 Thread Steve Schmitz
We could add a strip-wscale option to scrub. It doesn't solve
the state pickup issue, but could prevent clients communicating
through the firewall from negotiating this option.


Does the Linux NAT code already do this?

We tried and temporarily split up our combined firewall/NAT machine into 
two, a firewall (the original combined machine with NAT commented out), and 
an extra NAT machine. When the NAT machine ran OpenBSD, contact with the 
wscale-ing Linux/Oracle server failed. When we installed Linux on the NAT 
machine, it worked, although in both cases the OpenBSD firewall was still 
between the NAT machine and the Oracle server.

So I conclude that either the OpenBSD firewall code has no trouble with 
wscale but the NAT code has, or the Linux NAT clears out the wscale TCP 
options from the initial SYN packet - i.e. does exactly what you propose.

I have not tried to flush the Linux NAT state (and thus, wscale size) and 
see if it crashes the connection. I only understood these issues after 
Daniels explanation.

Cheers, Steve


_
Help STOP SPAM: Try the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail



Re: PF NAT and Oracle/Linux mystery

2003-01-18 Thread Daniel Hartmeier
On Sat, Jan 18, 2003 at 08:42:04AM +, Steve Schmitz wrote:

 Does the Linux NAT code already do this?

Possibly, but I'll have to check the source code to verify. It could
either strip the option or set any scale factors inside the option to
zero. But doing that is not much simpler than actually supporting
non-zero factors. All these approaches have the limitation that they
only work if the code sees the TCP handshake of the connections.

 So I conclude that either the OpenBSD firewall code has no trouble with 
 wscale but the NAT code has, or the Linux NAT clears out the wscale TCP 
 options from the initial SYN packet - i.e. does exactly what you propose.

It's the OpenBSD TCP sequence number tracking code that stalls such
connections, and that is used whenever you filter a TCP connection
statefully (when using 'keep state'). pf always creates a state entry
when any translation (like nat, rdr or binat) is applied to a
connection. If you were filtering statelessly with pf and doing nat on
the Linux box, that might explain why the connection didn't stall.

In the tcpdumped session you quoted, the client was using 'wscale 0' and
the server 'wscale 9'. That means the client's window values didn't get
shifted/multiplied at all, and the server's were shifted 9 bits
(multiplied by a factor of 2^9=512).

The server started sending window values of 12 (meaning 12*512=6144) and
increased them to 52 (meaning 52*512=26624). As long as the client sent
smaller segments, pf let them through. But the first larger packet gets
dropped, and the client retransmits it until the connection times out.
So you might not always see a stall, depending on the kind of traffic
the client sends. If it's all small packets (like an interactive SQL
session, where the client sends only small commands), it could work.

Also, the server might have used a lower scaling factor on other
connections. wscale 9 is quite large, that means it wants to be able to
advertise a maximum window of 65535*512 bytes, about 32 MB. Such a large
window would mean the client is invited to send up to 32 MB of data
before getting an acknowledgment. I don't know how Linux calculates the
scaling factors, but I guess it might depend on the memory available for
such buffers at run-time. It might have chosen a lower scaling factor
during the second test. But that's just a guess :)

It's also interesting that your client chose wscale 0, indicating that
it doesn't itself want to scale its own windows (because it has no large
buffers?) but wants to support the peer doing so. If you worry about
performance impacts due to disabled window scaling, it might depend on
the nature of your traffic. If only the server uses large windows (using
scaling factors), only bulk traffic client - sender would benefit. If
your client only sends small queries but gets large results back, using
a factor only for the server's windows wouldn't improve performance.

Daniel




Re: PF NAT and Oracle/Linux mystery

2003-01-18 Thread Mike Frantzen
 We could add a strip-wscale option to scrub. It doesn't solve
 the state pickup issue, but could prevent clients communicating
 through the firewall from negotiating this option.
 Does the Linux NAT code already do this?

Linux's stock state code doesn't track sequence numbers.
 
.mike




Re: PF NAT and Oracle/Linux mystery

2003-01-17 Thread Daniel Hartmeier
On Fri, Jan 17, 2003 at 07:51:29AM +, Steve Schmitz wrote:

 The firewall is running not quite the newest version of OpenBSD/PF (a 3.2 
 beta). Is it advisable to upgrade, given the interruption in service?

I doubt it will make a difference, as that part of the code (the
sequence number tracking) hasn't changed since then, so no.

 Jan 16 18:41:32 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 
 139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 
 win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 
 4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=130 dir=out,fwd
 Jan 16 18:41:32 firewall /bsd: pf: State failure on: 1

This error means that the client (192.168.101.1) tries to send a packet
to the server (141.225.240.34) with a sequence number (3987556722) and
length (121) larger than the window the server expects (3987556722-77).

When the server acks a packets, the expected window is increased to the
acked sequence number plus the advertised window. In this case, the
expected window is extremely small (just 55 bytes), so basically the
next packet is certain to fail the check. The client isn't sending
prematurely here, it just sends the next packet after it got an ack for
the previous part (seq=3987556722, src.seqlo=3987556722), the question
is why the window is too small. Possibly, the last ack from the server
had a very small th_win.

You mentioned the behavior depends on the OS (and application) of the
server. When Oracle runs on Solaris, it works. And when you connect to
the Linux Oracle to another service (ssh, etc.), it works, too? If
that's the case, I wonder whether the Oracle on Linux is configured to
use any TCP options that might affect window sizes (th_win).

Could you run a tcpdump -nvvvSpi int_if to catch all packets of a new
connection up to the point where it stalls? You can use a filter
expression (like 'host 192.168.101.14') to only capture packets of a
single connection, as the stall occurs after around 130 packets, the log
shouldn't get too large.

Mike, have you ever seen such a case before?

Daniel




Re: PF NAT and Oracle/Linux mystery

2003-01-17 Thread Mike Frantzen
 You mentioned the behavior depends on the OS (and application) of the
 server. When Oracle runs on Solaris, it works. And when you connect to
 the Linux Oracle to another service (ssh, etc.), it works, too? If
 that's the case, I wonder whether the Oracle on Linux is configured to
 use any TCP options that might affect window sizes (th_win).

In the tcpdump output, look for wscale int on the first packet.
Our state code doesn't handle window scaling which I can see Oracle
enabling.

'echo 0  /proc/sys/net/ipv4/tcp_window_scaling' on the linux box to
turn it off.
 
 Mike, have you ever seen such a case before?

Nope.

.mike




Re: PF NAT and Oracle/Linux mystery

2003-01-17 Thread Steve Schmitz
You mentioned the behavior depends on the OS (and application) of the 
server. When Oracle runs on Solaris, it works. And when you connect to the 
Linux Oracle to another service (ssh, etc.), it works, too?

I am not allowed to log into Linux/Oracle server. I tried with netcat on a 
sister machine of the L/O server and this worked okay.

Could you run a tcpdump -nvvvSpi int_if to catch all packets of a new 
connection up to the point where it stalls? You can use a filter expression 
(like 'host 192.168.101.14') to only capture packets of a single 
connection, as the stall occurs after around 130 packets, the log shouldn't 
get too large.

Find the log attached. The client this time was 192.168.101.9.

Cheers, Steve


_
Help STOP SPAM: Try the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail


oracle-hang.log
Description: Binary data


Re: PF NAT and Oracle/Linux mystery

2003-01-17 Thread Daniel Hartmeier
On Fri, Jan 17, 2003 at 02:01:39PM +, Steve Schmitz wrote:

 Any idea why they do this?

The TCP header has only space to hold a 16-bit unsigned number to hold
the window value, so windows are traditionally limited to 65535 bytes,
which can limit performance on fast networks.

RFC 1323 (http://www.faqs.org/rfcs/rfc1323.html) defines the Window
Scale Option as an extention to TCP (RFC 793).

If the client supports the extention, it will add a TCP option to its
initial SYN packet, indicating its support (and supplying its own scale
factor). If the peer also supports the extention, it will add its own
TCP option to the SYN+ACK, supplying its scale factor (the two factors
can be different). If only one of the peers understands the extention,
the ignorant one will not add the TCP option, and the proposing one must
not scale its window values.

So, you don't necessarily have to modify the external server, it would
be sufficient to make your client not add the TCP option. Because it
adds the option ('wscale 0' in your tcpdump), the server is free to use
'wscale 9'. If your client doesn't add the option, the server won't try
to scale its windows, either.

The problem with adding support for this extention to pf is that the
needed information is communicated only in the initial SYN and SYN+ACK
packets of a connection. If pf sees those, it could note the two factors
in the state entry and multiply each subsequent window value accordingly,
without much difficulty.

But if pf creates a state entry from packets after the TCP handshake
(like when you flush your state entries, and don't limit state creation
to 'flags S/SA', so pf 'picks up' existing connections), there's no (simple)
way to deduce the factors from subsequent packets, so such state entries
would still cause stalled connections.

I guess we could add support for the case where pf does see the
handshake, but this is the first time I see this problem reported, maybe
RFC 1323 adoption isn't that broad.

Daniel




Re: PF NAT and Oracle/Linux mystery

2003-01-17 Thread kjell
 Return-Path: [EMAIL PROTECTED]
 Delivery-Date: Fri Jan 17 14:46:14 2003

 If the client supports the extention, it will add a TCP option to its
 initial SYN packet, indicating its support (and supplying its own scale
 factor). If the peer also supports the extention, it will add its own
 TCP option to the SYN+ACK, supplying its scale factor (the two factors
 can be different). If only one of the peers understands the extention,
 the ignorant one will not add the TCP option, and the proposing one must
 not scale its window values.

We could add a strip-wscale option to scrub. It doesn't solve
the state pickup issue, but could prevent clients communicating
through the firewall from negotiating this option.

-kj




Re: PF NAT and Oracle/Linux mystery

2003-01-16 Thread Daniel Hartmeier
On Thu, Jan 16, 2003 at 02:54:29PM +, Steve Schmitz wrote:

 Any ideas?

Could be fragments. Can you try with

  scrub in on $ext_if all no-df
  scrub out on $ext_if all no-df

If you run pfctl -si, do you see any of the 'Counters' at the bottom
increase when you get a stalled connection?

Also, can you enable debug loggin (pfctl -x m) and check
/var/log/messages for relevant entries, after reproducing the problem?

Daniel




Re: PF NAT and Oracle/Linux mystery

2003-01-16 Thread Steve Schmitz
Could be fragments. Can you try with

  scrub in on $ext_if all no-df
  scrub out on $ext_if all no-df

If you run pfctl -si, do you see any of the 'Counters' at the bottom
increase when you get a stalled connection?

Also, can you enable debug loggin (pfctl -x m) and check
/var/log/messages for relevant entries, after reproducing the problem?


I included the two scrub lines into the ruleset and flushed and reloaded the 
pf, but to no avail. Log attached.

The firewall is running not quite the newest version of OpenBSD/PF (a 3.2 
beta). Is it advisable to upgrade, given the interruption in service?

Cheers, Steve


_
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE* 
http://join.msn.com/?page=features/virus
192.168.101.14 - the node which tries to connect to Oracle/Linux
141.225.240.34 - the Oracle/Linux server
139.33.102.140 - the OpenBSD/PF NAT (and FW) machine


Jan 16 18:41:32 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 
139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 
win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 
4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=130 dir=out,fwd
Jan 16 18:41:32 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 
139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 
win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 
4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=130 dir=out,fwd
Jan 16 18:41:32 firewall /bsd: pf: State failure on: 1
Jan 16 18:41:32 firewall /bsd: pf: State failure on: 1
Jan 16 18:41:44 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 
139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 
high=3987556777win=28480 modulator=0] [lo=3963179816 high=3963208296 
win=5792 modulator=0] 4:4PA seq=3987556722 ack=3963179816 len=121 ackskew=0 
pkts=131 dir=out,fwd
Jan 16 18:41:44 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 
139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 
win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 
4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=131 dir=out,fwd
Jan 16 18:41:44 firewall /bsd: pf: State failure on: 1
Jan 16 18:41:44 firewall /bsd: pf: State failure on: 1



Counters
 match  308080.0/s
 bad-offset 00.0/s
 fragment   00.0/s
 short  00.0/s
 normalize  00.0/s
 memory 00.0/s

[ shortly after ]

Counters
 match  325000.0/s
 bad-offset 00.0/s
 fragment   00.0/s
 short  00.0/s
 normalize  00.0/s
 memory 00.0/s