Re: [Int-area] New Version Notification for draft-bonica-intarea-gre-mtu-00.txt

Templin, Fred L Mon, 01 Jul 2013 14:18:49 -0700

Hi Joe,

> -----Original Message-----
> From: Joe Touch [mailto:[email protected]]
> Sent: Monday, July 01, 2013 1:51 PM
> To: Templin, Fred L
> Cc: Carlos Pignataro (cpignata); Ronald Bonica; Internet Area
> Subject: Re: [Int-area] New Version Notification for draft-bonica-
> intarea-gre-mtu-00.txt
> 
> Hi, Fred,
> 
> On 7/1/2013 1:27 PM, Templin, Fred L wrote:
> ...
> >> On 7/1/2013 8:34 AM, Templin, Fred L wrote:
> >>>>> OK, but IPv4 also has a limit of minMRU=576. So, we have:
> >>>>>
> >>>>>      IPv4 minMTU = 576 (*)
> >>>>>      IPv4 minMRU = 576
> >>>>>      IPv6 minMTU = 1280
> >>>>>      IPv6 minMRU = 1500
> >>>>>
> >>>>> (*) Even though the specs say that IPv4 minMTU = 68, everyone
> >>>>> seems to be saying that for practical purposes it is now 576.
> >>>>
> >>>> There needs to be a difference between the minMTU and the minMRU;
> if
> >>>> not, then IP-in-IP tunnels will never succeed without a separate
> >>>> fragmentation and reassembly layer - and although SEAL provides
> >> that,
> >>>> we
> >>>> currently do not require anything like that for X-in-X
> >> encapsulation.
> >>>
> >>> With IPv4, there is no difference between minMTU and minMRU.
> >>> Tunnels over IPv4 therefore set DF=0 to allow for in the
> >>> network fragmentation if necessary.
> >>
> >> But then that's useless. Let's say you already send just 576, and
> set
> >> DF=1, and that packet encounters an IPv4 tunnel. You add 20 bytes of
> >> header, resulting in 596.
> >>
> >> At the tunnel ingress, you can't fragment the inner packet because
> DF=1
> >> - and why shouldn't it be set? You're using the minMTU.
> >>
> >> At that ingress, you can't fragment the outer packet because you
> would
> >> need the egress to reassemble something that is 596 -- larger than
> the
> >> egress ever expected to reassembly (minMRU).
> >>
> >> So you drop the packet and send an ICMP too-big back to the source,
> who
> >> drops it because they're already sending minMTU packets and doesn't
> >> think it should have to drop the MTU below that.
> >>
> >> AFAICT, you now have broken the path completely.
> >
> > For IPv6-within-IPv4 at least, most of these "transition mechanism"
> > tunnels assume a minMRU of 1500 even though the IPv4 specs say that
> > the MRU is 576. Sure, that is broken but it seems to have become the
> > IPv6 transition mechanism "best practice".
> 
> I was giving an example of IPv4-in-IPv4, to demonstrate that when
> minMTU
> == minMRU then tunnels are not possible.


I took a quick look, and it looks like RFC2003 skirted the issue
of minMTU/minMRU sizes altogether. At least the v6/v4 transition
mechs did a cursory study of packet sizes.

> >>> With IPv6, minMTU is smaller than minMRU but that does not
> >>> guarantee that a packet sent by the ingress can be received
> >>> by the egress without fragmentation.
> >>
> >> No, but it does guarantee that the packet can traverse a tunnel and
> >> still make it to its destination.
> >
> > Right, but remember that the IPv6 minMRU is only 1500 bytes.
> > So, we can't assume an unlimited reassembly buffer on the
> > tunnel egress.
> 
> Agreed.

OK.

> >> The difference between minMTU and minMRU is the amount of
> accumulated
> >> headers you can accommodate by tunneling. At 1500-1280, that's 5
> levels
> >> of nested IPv6 tunnels, or more than a few IPsec tunnel-mode tunnels
> if
> >> needed.
> >
> > I addressed this point on the IPv6 list, but think for a moment
> > what this means. It means that the fist tunnel ingress would have
> > to configure a 1280 MTU, the second tunnel ingress would need to
> > configure a 1320 MTU (to accept encapsulated packets from the first),
> > the third tunnel ingress would need to configure a 1360 MTU (to
> > accept encapsulated packets from the second), etc. up to 5 levels
> > of nesting. I don't know about you, but I can imagine many situations
> > where it is not possible for a single operator to lay hands on every
> > tunnel ingress so as to carefully set each MTU in this way (I gave
> > one example on the IPv6 list).
> 
> We agree, but the punchline is that:
> 
> a) minMRU *MUST* be larger than minMTU, or else there cannot be tunnels

Yes - I am totally with you on this.

>       how much larger depends on how much nesting you expect;

Not necessarily. If the first tunnel ingress set a small fragment
size (say, 1280) and fragments anything between (1280-1500) to
something smaller (say 750), then each subsequent ingress in the
path would not need to fragment further and we can get away with
a reassembly buffer size of just (1500+HLEN) at the ultimate tunnel
far end.
 
>       IPv6 allows 5 levels if plain nesting and around 2 of IPsec.
>       The former seems fine, but the latter is tight. I would not
>       at all be surprised if future networking ended up experiencing
>       more than 2 levels of IPsec tunnel.

Right, but without fragmentation the way you *stack* those levels
of nesting needs to be carefully coordinated which may not always
be possible.

> b) we CANNOT deprecate fragmentation

100% with you that fragmentation is needed, but:

>       if we do, we are deprecating tunneling

If we deprecate IP fragmentation but then at the same time replace
it with SEAL we will still have the functionality that we need.
 
> >> That's not as much as I'd like, but it's at least non-zero.
> >
> > I think the maximum acceptable level of nesting is somewhere
> > between 5 and 10 before the nesting would be declared "recursive"
> > (i.e., a routing loop). SEAL allows up to 8 levels of nesting.
> 
> It's "recursive" when it's recursive, and there's strictly no way to
> determine that because an address can have different meanings at
> different layers. The only solution would be to have each layer insert
> a
> "thumbprint" and look for cycles in that, but that approach would
> consume the encapsulation budget more quickly.

SEAL has a "level" field which is essentially a counter that counts
down from 7 to zero. When we reach level 0, assume recursion and quit.

> Two levels of tunnel are required to allow revisitation, where a single
> machine can emulate multiple virtual routers (without requiring OS
> virtualization such as VMware). More levels are needed to support other
> kinds of abstraction; my own work used up to 4 levels for
> non-demonstration purposes (we demo'd 16).

Sounds very reasonable. I think we want to avoid getting locked
into rigid notions of what tunnel nesting would be used for.

> >>   > RFC2473 acknowledges
> >>> this by using fragmentation at the ingress as a limiting
> >>> condition for when the MTU within the tunnel becomes too
> >>> small. This can happen for example if there is a 1280 MTU
> >>> link within the tunnel, if there are nested encapsulations
> >>> within the tunnel, etc.
> >>
> >> Yes, but if minMRU == minMTU, then the number of encapsulations
> >> supported is zero.
> >
> > Right. That's why IPv6-in-IPv4 transition mechanisms cheat and
> > assume 1500 when the specs say they can only assume 576.
> 
> Agreed. That's why we need *realistic* numbers for that, not merely
> those that reflect what's implemented (e.g., for IPv4).

Right, but what I am ultimately after is an unlimited tunnel
MTU (I think you were on board with this idea from earlier
discussion). The only difference is that I want fragmentation
and reassembly out to 1500 bytes only, and then let the bigger
packets go through unfragmented as long as they fit. "Take care
of the smalls, and let the bigs take care of themselves."

Thanks - Fred
[email protected]

> Joe
_______________________________________________
Int-area mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/int-area

Re: [Int-area] New Version Notification for draft-bonica-intarea-gre-mtu-00.txt

Reply via email to