On Sun, May 15, 2022 at 01:44:39PM -0000, Stuart Henderson wrote:
> >
> > - mixing mtu to 1500 and scrub: well, both concern issues with mtu. why
> >   wouldn;t they be together in there?
> 
> They're related but one is for avoiding the problem in the first place
> (which may or may not work, depending on the ISP and backhaul network)
> and the other is working around problems encountered (due to
> misconfiguration of other people's networks) as a result.
> 
> Putting them together in one large section isn't so bad for pppoe, though
> it already feels like it makes it harder to distinguish the two, but in
> the context of using this as a base for text relating to other interface
> types then the RFC 4638 bits aren't relevant at all there.
> 
> > - "causing conflict": feel free to be more specific. it's not something
> >   i have knowledge of
> 
> outline:
> 
> - client <> router is on ethernet and can pass packets of 1500 bytes
> (or even larger)
> 
> - router <> "the internet" can sometimes carry 1500 byte packets but
> via certain types of connection can only pass packets of a smaller size,
> e.g. 1492 bytes with standard pppoe, some ISPs have tougher restrictions
> (either outright, or "work but don't work _well_" if you go above some
> other size)
> 
> - router <> "sites accessed by tunnel/vpn over the internet" has an
> extra header inserted in packets, further reducing the available size
> for packets (usually 1420 bytes for wg(4) though can be less if
> it's carried over a more restrictive internet connection than usual,
> other sizes for other types of tunnel/vpn)
> 
> - website (or other host "on the internet") <> "the internet" can
> typically send packets of 1500 bytes
> 
> so the two endpoints of a TCP connection (say, client and website)
> can send 1500 byte packets to their immediate upstream. but the path
> between them (router/ISP/internet/vpn/whatever) can only carry smaller
> packets.
> 
> clear so far?
> 
> the size of packets which can be carried on a particular network
> interface is "the MTU" of that interface. this defaults to the hardware
> capable size or 1500 whichever is less.
> 
> for TCP packets there is a negotiation at connection setup between the
> two sides. they look at the MTU of the route to reach the other address
> which defaults to that of the network interface used to reach it.
> subtract the TCP header size, and call that MSS "maximum segment
> size". they tell the other side their idea of MSS (in the TCP SYN
> packet) and the lowest of the two is used for the connection
> (so packets are capped at that size).
> 
> this is fine where the whole path can cope with the same sized packets,
> but if not then a router on the path must either split it into fragments
> (much slower than simply forwarding it, involving use of the router cpu
> which is usually fairly weedy) or send a "fragmentation needed" ICMP
> message and rely on the other side to do it. (the common case is for
> TCP connections to be generated with packets flagged as "don't fragment"
> because the endpoints want to know about the issue so they can adapt
> to it).
> 
> in the best case, the relevant endpoint (e.g. client or website)
> receives that message and acts on it by reducing the size of packets
> it then transmits. there's still some overhead from detecting the
> oversized packet and reacting to it but things "work".
> 
> in the worst case, those packets don't reach the relevant endpoint.
> (various possible reasons. maybe a misconfigured firewall blocks all
> ICMP. maybe there's some link numbered on private addresses in the
> network path and the frag-needed message was sent from a private
> address and blocked by a firewall. maybe some loadbalancing or
> queueing or icmp-packets-per-second limit got in the way. lots of
> options).
> 
> so in that case the endpoint sending the oversize packet doesn't
> know it must reduce packet size, and the packet doesn't make it
> through.
> 
> (it's actually worse with a standard IPsec tunnel because the MTU is
> that of the interface carrying the network route, usually the default
> route, so in that case it also affects connections where the VPN is
> run directly on an endpoint, not just where the VPN is handled on a
> separate router).
> 
> anyway. when a rule with "scrub (max-mss XYZ)" is matched by a
> TCP SYN packet, PF inspects the maximum segment, if it's higher than
> XYZ it modifies the packet and sets it to XYZ instead. the effect is that
> TCP packets are kept below the size which can be actually be transmitted
> across the network and so the problem is avoided.
> 
> (elephant in the room: non TCP packets. there's often no handshake
> mechanism like TCP with MSS negotiation, so the only real options are to
> keep packets smaller or to do some specific probing to see which packet
> sizes make it through. this is usually either handled individually in
> some way or other by those protocols which run across the internet, or
> they just ignore it and break sometimes).
> 
> > - "more information in pf.conf": yes there is information in pf.conf on
> >   mtu, mss, and nat, including the syntax for using them. again, why
> >   wouldn;t we point people there?
> 
> About MTU, pf.conf(5) mostly talks in terms of fragment handling on a
> machine running PF, there's a reference to path MTU discovery when talking
> about IPv6 but there's no introduction for somebody who doesn't know what
> the problems are.
> 
> The sum total of text about MSS is
> 
> "
>      max-mss number
>            Enforces a maximum segment size (MSS) for matching TCP packets.
> "
> 
> this is not even clear about what the option does (it's actually
> "reduces the MSS on TCP syn packets if it is higher than the max-mss
> value" but with the word "enforces" it could be be read as "drops
> the packet if MSS is not equal to this value") and has nothing on
> relation to MTU or why one might use it.
> 
> "If the maximum segment size (MSS) on matching TCP packets exceeds
> <number>, modify the packet and set it to <number>" would be better than
> we have but this definitely feels like it needs more explanation as to
> why it might be needed.
> 
> Arguably pf.conf would be the wrong place to describe TCP fundamentals
> but it's probably the one place people running into the problem on non-
> pppoe interfaces are most likely to discover. (I think the logic of
> including this in pppoe(4) originally was that for PF people are more
> likely to copy-and-paste/modify sample configs rather than actually
> read the PF manual because it's so long and they'll _probably_ be more
> likely to see it in pppoe.)
> 
> > i'm happy to try and rework the text if you think it can be improved.
> 
> info about "scrub max-mss" probably wants to go in pf.conf and just
> referenced from pppoe (and maybe other places)
> 
> : MTU/MSS ISSUES
> :    Problems can arise on machines with private IPs connecting to the 
> Internet
> :    via a machine running both Network Address Translation (NAT) and pppoe.
> 
> this above is totally wrong, it is nothing to do with NAT and private IPs,
> same applies on routed address blocks going through a smaller MTU network
> 
> :    Standard Ethernet uses a maximum transmission unit (MTU) of 1500 bytes,
> :    whereas PPPoE mechanisms need a further 8 bytes of overhead.  This 
> leaves a
> :    maximum MTU of 1492.  pppoe sets the MTU on its interface to 1492 as a
> :    matter of course.  However, machines connecting on a private LAN will 
> still
> :    have their MTUs set to 1500, causing conflict.  Using a packet filter, 
> the
> :    maximum segment size (MSS) can be set (clamped) to the required value.  
> The
> :    following rule in pf.conf(5) would set the MSS to 1440:
> 
> i'd prefer splitting up something like
> 
> "PPPOE AND MTU/MSS
> 
> PPPoE has an 8 byte header. When run over a network interface with the
> standard Ethernet maximum transmission unit (MTU) of 1500 bytes, this
> reduces the maximum available MTU to 1492. pppoe(4) sets the default
> MTU to this value."
> 
> then something briefly explaining the issue (if it can be condensed)
> and point to an expanded pf.conf(5) bit about max-mss which could be
> referenced from other places too, and a separate bit about 4638
> negotiation, maybe sth like
> 
> "MTU/MSS NEGOTIATION
> 
> When using a pppoedev configured for a higher MTU ("jumbo
> frames"), the MTU for the pppoe(4) device can also be raised.
> In this case pppoe(4) attempts to negotiate the higher size with
> the other PPPoE endpoint using the RFC 4638 protocol.
> This can allow standard Ethernet packet sizes (1500 bytes) to be
> carried over PPPoE. However, RFC 4638 negotiation only takes into
> account the MTU configured on[...as per original...]"
> 
> (while I'm thinking of pppoe(4), we've also got to do something with
> "inet 0.0.0.0 255.255.255.255 NONE / dest 0.0.0.1", which just
> doesn't work reliably - the 0.0.0.1 needs to be set *before*
> bringing the interface up - "inet 0.0.0.0 255.255.255.255 0.0.0.1"
> is better)
> 

ok, here's a stab. feel free to suggest amendments. is it going in a
better direction?

- i didn;t directly say "see pf.conf" because i'd just referenced it,
  the manual is in SEE ALSO, and adding such a blurb felt like needless
  repetition.

- i left in a max-mss example because it could save an unneccessary
  lookup and also because it felt right to have an explicit example of
  what is meant.

jmc

Index: man4/pppoe.4
===================================================================
RCS file: /cvs/src/share/man/man4/pppoe.4,v
retrieving revision 1.35
diff -u -p -r1.35 pppoe.4
--- man4/pppoe.4        16 Mar 2021 13:53:39 -0000      1.35
+++ man4/pppoe.4        15 May 2022 16:34:15 -0000
@@ -96,10 +96,9 @@ This all is typically accomplished using
 file.
 A typical file looks like this:
 .Bd -literal -offset indent
-inet 0.0.0.0 255.255.255.255 NONE \e
+inet 0.0.0.0 255.255.255.255 0.0.0.1 \e
        pppoedev em0 authproto pap \e
        authname 'testcaller' authkey 'donttell' up
-dest 0.0.0.1
 inet6 eui64
 !/sbin/route add default -ifp pppoe0 0.0.0.1
 !/sbin/route add -inet6 default -ifp pppoe0 fe80::%pppoe0
@@ -144,44 +143,35 @@ by sending a PADT packet to explicitly t
 Add the following to the kernel config file:
 .Pp
 .Dl option PPPOE_TERM_UNKNOWN_SESSIONS
-.Sh MTU/MSS ISSUES
-Problems can arise on machines with private IPs connecting to the Internet
-via a machine running both
-Network Address Translation (NAT)
-and
-.Nm .
-Standard Ethernet uses a
-maximum transmission unit (MTU)
-of 1500 bytes,
-whereas PPPoE mechanisms need a further 8 bytes of overhead.
-This leaves a maximum MTU of 1492.
+.Sh PPPOE AND MTU/MSS
+PPPoE has an 8-byte header.
+When run over a network interface with the
+standard Ethernet maximum transmission unit (MTU) of 1500 bytes,
+this reduces the maximum available MTU to 1492.
 .Nm
-sets the MTU on its interface to 1492 as a matter of course.
-However,
-machines connecting on a private LAN will still have their MTUs set to 1500,
-causing conflict.
-Using a packet filter,
-the
-maximum segment size (MSS)
-can be set (clamped) to the required value.
-The following rule in
-.Xr pf.conf 5
-would set the MSS to 1440:
+sets the default MTU to this value.
+Unfortunately issues can occur when the path between
+the two endpoints of a TCP connection are not able to carry
+same sized packets,
+leading to possible packet fragmentation and sometimes packet loss.
+In that case the maximum packet size can be set using the
+.Cm max-mss
+option in
+.Xr pf.conf 5 .
+For example:
 .Pp
 .Dl match on pppoe0 scrub (max-mss 1440)
-.Pp
-Although in theory the maximum MSS over a PPPoE interface
-is 1452 bytes,
-1440 appears to be a safer bet.
-Note that setting the MSS this way can have undesirable effects,
-such as interfering with the OS detection features of
-.Xr pf 4 .
-.Pp
-Alternatively in cases where the remote equipment supports RFC 4638
-and the physical interface is configured to support jumbo frames,
-the MTU of the
+.Sh MTU/MSS NEGOTIATION
+When using a PPPOE device configured for a higher MTU ("jumbo frames"),
+the MTU for the
+.Nm
+device can also be raised.
+In this case
 .Nm
-interface can be raised and it will attempt to negotiate an increased MTU.
+attempts to negotiate the higher size with the other PPPoE endpoint
+using the RFC 4638 protocol.
+This can allow standard Ethernet packet sizes (1500 bytes)
+to be carried over PPPoE.
 For example, in
 .Pa /etc/hostname.pppoe0 :
 .Bd -literal -offset indent
@@ -192,32 +182,17 @@ dest 0.0.0.1
 !/sbin/route add default -ifp pppoe0 0.0.0.1
 .Ed
 .Pp
-The physical interface must also be configured like so:
+The physical interface would also have to be configured correspondingly:
 .Bd -literal -offset indent
 # echo "up mtu 1508" > /etc/hostname.em0
 .Ed
 .Pp
-With this, the previously mentioned MSS clamping rules in
-.Xr pf.conf 5
-are no longer necessary.
-.Pp
-If the MTU is set to a value larger than 1492 and the remote endpoint does
-.Em not
-support RFC 4638,
-.Nm
-will write
-.Dq \&No valid PPP-Max-Payload tag received in PADO
-to the kernel message buffer and the MTU will remain at the default value.
 However, RFC 4638 negotiation only takes into account the MTU configured
 on the endpoints, not the maximum MTU supported on the path between them.
 If the path cannot pass the larger Ethernet frames, negotiation will succeed
 but the larger frames will be dropped.
 For this reason it is important to test the connection with large packets
 when enabling a higher MTU.
-.Pp
-See
-.Xr pf.conf 5
-for more information on MTU, MSS, and NAT.
 .Sh SEE ALSO
 .Xr sppp 4 ,
 .Xr hostname.if 5 ,
Index: man5/pf.conf.5
===================================================================
RCS file: /cvs/src/share/man/man5/pf.conf.5,v
retrieving revision 1.595
diff -u -p -r1.595 pf.conf.5
--- man5/pf.conf.5      9 May 2022 21:48:00 -0000       1.595
+++ man5/pf.conf.5      15 May 2022 16:34:15 -0000
@@ -2313,7 +2313,12 @@ Parameters are specified enclosed in par
 At least one of the following parameters must be specified:
 .Bl -tag -width xxxx
 .It Cm max-mss Ar number
-Enforces a maximum segment size (MSS) for matching TCP packets.
+Reduces the maximum segment size (MSS)
+on TCP SYN packets to be no greater than
+.Ar number .
+This is sometimes required in scenarios where the two endpoints
+of a TCP connection are not able to carry similar sized packets
+and the resulting mismatch can lead to packet fragmentation or loss.
 .It Cm min-ttl Ar number
 Enforces a minimum TTL for matching IP packets.
 .It Cm no-df

Reply via email to