Re: [Int-area] New Version Notification for draft-bonica-intarea-gre-mtu-00.txt

Carlos Pignataro (cpignata) Mon, 03 Jun 2013 17:01:58 -0700

Hi, Joe,

On Jun 3, 2013, at 1:43 PM, Joe Touch <[email protected]> wrote:

> Hi, Carlos,
> 
> On 6/2/2013 12:22 PM, Carlos Pignataro (cpignata) wrote:
>> Joe,
>> 
>> On May 29, 2013, at 4:31 PM, Joe Touch <[email protected]
>> <mailto:[email protected]>> wrote:
>> 
> ...
>>> I agree; my general recommendation has always been "the egress should
>>> always clean up any mess created by an ingress" - which means using
>>> "outer" fragmentation rather than "inner".
>> 
>> Why do you characterize this as a "mess"?
> 
> The ingress is creating a situation that takes work to correct (the "mess" I 
> refer to).
> 
> A. Using outer fragmentation places that work at the egress.
> 
> B. Using inner fragmentation pushes that work to the destination.
> 
> Note that fragmentation is a lot cheaper (less work) than reassembly. So (A) 
> makes it easy for routers to support GRE more cost-effectively, but drains 
> things like my iPhone battery as a result.
> 

[I think you mean (B) and not (A) in the sentence above]

A different way of looking at it is that a Host is generally more optimized to 
perform this function than a router. Or, if you prefer the view that tunnel 
endpoints act as hosts for the Tunnel (delivery), then a "host" is more "host" 
than a router for reassembly.

Frankly, for the case of draining smartphone battery, you could just use IPv6 
or set DF.

> I don't like that optimization. If you make a mess, IMO you should clean it 
> up.
> 

I understand you have a preference and you do not like it. On the other hand, 
it is still a valid case. It is documented, for example, in the second para of 
S4.1.4 of RFC 3931 http://tools.ietf.org/html/rfc3931#section-4.1.4

> > Thinking of the tunnel as a
>> logical link with its own logical link MTU (LMTU), and this link is
>> unwilling/incapable of performing fragmentation and reassembly at its
>> "data-link",
> 
> There's a *big* difference between incapable and unwilling.
> 
> Incapable is fine - drop and send the ICMP.
> 
> Unwilling isn't a reason to skip the hard work.
> 
>> then the appropriate behavior would be:
>> 
>>  * If the incoming IPv4 datagram has DF=1, drop and send a PTB back; and
>>  * if the incoming IPv4 datagram has DF=0, then fragment it and send it
>>    over the link.
>> 
>> What's different?
> 
> It's exactly the difference between incapable and unwilling.
> 
> Here's the difference for GRE:
> 
>       - GRE adds N bytes total (GRE header + IP header)
> 
>       - GRE over IP supports 65536-byte packets
> 
> So if a packet arrives that is smaller than 65536-N, IMO GRE ought to 
> fragment and reassemble it.
> 
> If a packet arrives that is larger than that, then GRE *cannot* tunnel it, 
> and *MUST* drop it and sent a PTB.
> 
> I.e., only 65536 is "too big". Everything else is just whining about "bigger 
> than I want it to be" ;-)

[For scope, this is an IPv4 discussion; with IPv4, the mechanism for this is to 
set DF.]

If DF is not set, which specification defines that a packet cannot be 
fragmented? It really is a trade-off; and as such, IMHO, there is not 
one-size-fits-all but there are deployment decisions and practices.

> 
>>> Yes, this can cause repeated frag/reassembly, but the alternative is
>>> to shift work to the end host, which I think is inappropriate.
>> 
>> I do not understand why it is "inappropriate" -- when thinking of the
>> tunnel as a link.
> 
> See above.

I am still not sure how "inappropriate" is a technical term. I do understand 
your preference and I do follow the logic. My point is that it does not seem to 
be a mandated behavior by a spec., and seems to be a trade-off to be documented 
with pros and cons for this case.

Let me try a different example, and let me know which step you disagree with :-)

0. GRE Tunnel between R1 and R2, tunneling IPv4 from end hosts.
1. The delivery (encapsulating) header sets DF (as allowed by RFC791, for 
example because R2 does not have sufficient resources to reassemble internet 
fragments.)
2. The GRE Tunnel is realized as a logical interface.
3. A 2000 octet IPv4 datagram from an end host has the GRE tunnel as the 
next-hop; R1 encapsulates in GRE, then IPv4 with DF set, and then when trying 
to send the datagram sees that it is larger than the physical MTU of the out 
interface.
4. R1 sends an ICMPv4 PTB to itself and drops the datagram, and the tunnel 
learns its MTU.
5. Another datagram is received for the tunnel, with the DF bit clear, what 
should R1 do? -> fragment the encapsulated packet and encapsulate it.

> 
>> There is also a potential mischaracterization when you say "shift work
>> to the end host", because that can lead to assumptions that there is "a
>> single" end host (singular). In the case of a p2p tunnel, the challenge
>> is that there is a single pair of endpoints in the tunnel, but a
>> multitude of hosts behind and before them. A different view would be
>> that it's more efficient to distribute reassembly to many endpoints
>> instead of attacking the tunnel tailend and making it work on behalf of
>> many hosts/
> 
> It would be an attack, except that you're talking about a tunnel. A tunnel is 
> an ingress and an egress. Are you suggesting that making an egress do its job 
> is an "attack"?

A different perspective: If the tunnel is a link in a logical topology, then 
this preference would be equivalent to asking every link to perform 
data-link-level fragmentation and reassembly and allow any size packets.

Thanks,

-- Carlos.

> 
> Joe
> 
>> 
> 

_______________________________________________
Int-area mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/int-area

Re: [Int-area] New Version Notification for draft-bonica-intarea-gre-mtu-00.txt

Reply via email to