>> Here’s the thing about fragmentation:
>>       1. all links have a maximum packet size
>>       2. all tunneling/encapsulation/layering increases payload size
>> 1+2 implies there is always the need for fragmentation at some layer:
> 1 implies that.
> There is enough head room designed in 1 to accommodate 2.

I'm not sure I follow what you're saying here. Ethernet MTU, the most
common value, is 1500 bytes. There's no reference to headroom for
that. If you're referring to the idea of artificially lowering MTUs to
account for potential overhead introduced in encapsulation that can be
done. However to avoid fragmentation _entirely_ one would need to
determine the maximum possible overhead ever added in encapsulation(s)
(plural in case of nested encapsulations). In a sprawling and dynamic
network that has different sub-domains and simultaneously uses
different encapsulation protocols, determining that specific magic
number might be infeasible. There is also the problem that some 0.01%
corner case of encapsulation might need extra large 100s of bytes of
overhead. Lowering the MTU for everyone just to avoid fragmentation
for that case is a poor tradeoff-- it's better to fragment for that


>>       3. fragmentation always splits info across packets
>> And there’s something important about layering:
>>       4. layering intends to isolate the behavior of one layer from another, 
>> such that
>>       it will always be impossible for an upper layer to know exactly what 
>> is going on below,
>>       i.e., to determine that limiting size across an entire path of 
>> possibly virtual tunnels
>> The next two are where we get into trouble:
>>       5. network devices increasingly WANT to inspect contents beyond the 
>> layer at which they are intended to operate
> not that network devices have an intent in themselves, but yes, it seems like 
> network operators want to inspect content or are forced into it because of 
> the necessity of IPv4 address sharing.
>>       6. inspecting contents ultimately means reassembly, at some level
> _some_ content inspection would require that, but I don't think you can make 
> that the general rule.
> e.g. a NAT or an L4 ACL only needs access to the L4 header.
>> Which brings us to the punchline:
>>       7. but network device vendors want to save money, so they don’t want 
>> to reassemble at any layer
> We'd all wish it to be that simple. Can you substantiate that claim?
> You can easily make the speculation that customers don't want to pay what it 
> costs to be able to do reassembly at terabit speeds...
> Or accept that it's technically hard.
> The implementations of e.g. NATs, IPv4 address sharing implementations I'm 
> aware of do flavours of network layer reassembly.
> However much money you throw at it, you can't reassemble fragments travelling 
> on different paths, nor can you trivially make network layer reassembly not 
> be an attack vector on those boxes.
>> So I agree, IP fragmentation has its flaws - but those flaws are created not 
>> only because it leaves out the transport port numbers, but also because DPI 
>> and NAT devices don’t reassemble. And they don’t because it’s cheaper to 
>> sell devices that say they run at 1 Gbps (e.g.) that don’t bother to 
>> reassemble.
> I don't agree with your conclusion.
> NATs extend the network layer to include the L4 ports. NAT implementations of 
> course do reassemble.
>> I.e., it will never matter what layering we add to fix this - GRE, GUE, 
>> Aero, etc. - ultimately, we’re doomed to need fragmentation support down to 
>> IP exactly because:
>>       a. #1-4 mean we need frag/reassembly at any tunnel ingress
>>       b. vendors want to sell #5 at a price that is too low for them to 
>> support #6 (i.e., point #7)
>> So pushing this to another layer will never solve it. What will solve it 
>> will only be a compliance requirement for #6 - which could be done right 
>> now, and has to be done for ANY solution to work.
> For IPv4 address sharing specifically removing network layer fragmentation 
> would be a solution.
>> NOTE: even rewriting EVERY application won’t fix this, nor will deploying a 
>> new layer at any level.
> For some type of content inspection that would require reassembling the whole 
> application context.
> But that's quite different from IPv4 address sharing, which we have 
> unfortunately made an integral part of the Internet architecture.
>> And yes, I do intend to add this to draft-ietf-tunnels, so it can be 
>> referred to elsewhere.
> Ole
