RE: fragmentation and tunnels (was: RE: [Int-area] Re: [tsv-area] Fwd: I-DACTION:draft-heffner-frag-harmful-03.txt)

Templin, Fred L Thu, 04 Jan 2007 16:02:22 -0800

Hi Bob, 

> >I have been looking at fragmentation over IP*-in-IPv4 tunnels
> >for some time, and 'draft-templin-linkadapt-03.txt' proposes
> >an Alternate Fragmentation (AF) scheme that occurs below the
> >transport layer segmentation (e.g., Packetization Layer Path
> >MTU Discovery) but above IPv4 fragmentation. It supports
> >segmentation/reassembly at the tunnel endpoints and uses a
> >dynamic segment size probing mechanism within the tunnel to
> >avoid in-the-network IPv4 fragmentation, yet remains compatible
> >with in-the-network fragmentation should it occur.
> 
> I've scanned it.


OK - thanks.

> My initial reaction is that is seems somewhat over-specific in places.
It 
> would seem better if only the bare bones needed for tunnel
fragmentation 
> was in there, rather than tying together the fragmentation scheme with
the 
> checksumming and so forth.

The trailing checksum is there to detect packet splicing errors,
memory errors, driver bugs, bogus segments, etc. and discard
corrupted data that might not otherwise be detected by the
Internet checksum. 

> Also I suspect the references to specific link 
> technologies will seem dated in a few years

Maybe so, but to a certain extent I believe injecting a bit of
history here is useful in terms of leveraging past experience
moving forward. For example, the ATM cell size is coincidentally
similar to the minimum IPv4 MTU of 68 bytes [RFC791], and AAL5
is coincidentally similar to the segmentation/reassembly being
proposed in the document.

> --- it needs to justify some of 
> the min & max MTU choices for the long term.

Well, for IPv6 we have a MinMTU of 1280 bytes for the long term
and I don't expect we will see that change soon. In terms of the
MaxMTU, my best-read of Jonathan Stone's thesis on "Checksums
in the Internet" was that the incidence of undetected errors
for 32b checksums becomes non-negligible for packet sizes larger
than about 9KB, which is coincidentally similar to the 9180 byte
MTU for IP over ATM. Future links that support even larger packet
sizes with strong error checking may allow for still larger
tunnel MTUs, but I'm not an expert on packet size interactions
with error detection mechanisms so would appreciate if others
who know more about it could comment.

> Also, it would be useful to 
> say why existing v6 fragment header mechanisms aren't applicable.

IPv6 sees the tunnel as an ordinary link that needs to present
an assured MTU of 1280 bytes or larger. If the tunnel has to
segment IPv6 packets into smaller chunks in order to fulfill
the MTU it advertises to IPv6, then that is a L2 issue with
IPv4 being the L2 encapsulation protocol in question.

If the encapsulation were to include a full IPv6 header with
frag header in every segment, that would be 48 bytes of IPv6
header plus 20 bytes of IPv4 header leaving 0 bytes for ULP
data on links that support only the minimum IPv4 MTU of 68
bytes. It would also be inefficient in general even on links
with larger-than-68b MTUs, and something of a layer violation.

That said, it may be useful to include just the first 4 octets
of the IPv6 header in the encapsulation of non-initial segments.
That would give the 8b Traffic Class and 20b Flow Label which
could be used by the reassembler to further disambiguate flows.
 
> >Use of the
> >scheme is indicated by setting the reserved bit in the IPv4
> >header 'Flags' field (to be renamed as the 'AF' bit), thus
> >it obsoletes RFC3514 (an "April Fools Day" RFC).
> 
> This seems a rather specific use for this last reserved bit, given
things 
> in the IP header are meant to be generally useful to upper layers.

Well, the bit occurs in the flags field associated with
fragmentation, so defining it to indicate an Alternate
Fragmentation (AF) scheme would seem like a natural fit.
 
> Couldn't use of the coding scheme be negotiated between tunnel
endpoints 
> for all tunnelled packets carrying a specific src-dest address pair?

This might work for persistent configured tunnels that are
established through some initial negotiation phase, but 
might not be so practical for automatic tunnels or tunnels
that are short-lived and/or only carry a small number of
packets. It is also possible that this AF scheme may be used
by IPv4 packetization layers other than tunnels and may even
supplement or replace RFC791 IPv4 fragmentation.

> I have to declare an interest in the reserved (evil) bit myself at
this 
> point. I've been presenting a draft myself in tsvwg 
> <draft-briscoe-tsvwg-re-ecn-tcp-03.txt> that relies on using 
> it. However, I like to think we have a strong justification for
grabbing it, 
> as the above draft is the culmination of a decade of work to fix the 
> resource allocation and accountability problems with the current
Internet architecture.

I know that RFC3514 had tounge-in-cheek when it used the term
"evil", but to be sure we are talking about precious real estate
here. I have been working on the tunnel MTU stuff since 2002
when the ngtrans wg closed and re-emerged as v6ops.

>From what I can tell one question comes down to whether we can
expect to see IPv6-in-IPv4 tunnels deployed on a wide scale and
on a long-term basis in the Internet? If so, we will need tunnel
MTU assurance that provides robustness and efficiency over
arbitrary Internet paths.

Another question comes to whether such an AF scheme could
supplement or replace RFC791 IPv4 fragmentation (along with
its myriad issues that are well documented) even for
non-tunnel applications.

> Are you aware of any other claims on that bit? I'm sure there 
> are many.

I knew of one proposal several years back, but to my knowledge
no I-Ds were ever issued and I don't know of any other claims.
Maybe someone else knows of other proposals.

> >This AF scheme codes the 'ip_id' field in the IPv4 header and
> >thus presents a shorter-than-16b ID (the current draft version
> >specifies only a 6b ID). The assumption (which I think is also
> >an assumption you are making) is that successful reassembly
> >will occur within a very short time window if it will occur
> >at all. This assumes that reassembly failure will normally
> >occur due to packet loss rather than gross reordering of
> >packets within the same flow.
> 
> I'm not quite making that assumption. I'm saying re-assembly 
> over shorter gaps should take priority over longer gaps, so
> re-assembly over longer gaps should be deferred for a while.

That's fine, but my intuition is that applications and
transports are not very tolerant of grossly reordered packets
and some may prefer to receive partial data (e.g. using UDPlite)
over data that arrives too late.
 
> >There have been many studies on packet reordering within the
> >Internet, but I have not found one yet that can completely
> >characterize the *degree* of reordering, i.e., the expected
> >number of places by which a reordered packet is out-of-order.
> >Most of the studies I have seen seem to suggest that the
> >expected degree of reordering of packets within a short chain
> >of packets sent in rapid succession within the same flow is
> >typically very small, e.g., a reordering event such as
> >(1,2,4,5,3,6,...) may occur occasionally, while one such
> >as (1,2,4,5,...,64k,3,64k+1,...) most likely will not. Other
> >factors to consider are: 1) as you observe, lengthening the
> >ID field may be insufficient in the presence of gross
> >reordering, and 2) transports such as TCP are likely to
> >treat grossly reordered packets as loss anyway.
> 
> I guess it depends when you look. If you catch a re-route in 
> progress into a shorter path, you might see something more like
> your second example if the re-route affects very fast flows. So,
> the typical case might not be the only case we have to allow for.

Route flaps happen all the time in the environments I care
about, and in many cases loss of one or a couple of packets
is unavoidable. Reliable transports already take care of this,
and mechanisms like UDPlite may be useful for applications
such as streaming media.
 
> >Finally, this work has been around for some time now, and
> >I believe has been reviewed by many while few have commented.
> >Perhaps now is the time for discussion on a wider basis.
> 
> I guess one immediate problem springs to mind. Current re-assembly 
> implementations have had to be hardened against frag, tear-drop etc 
> attacks. In defining a new coding of the packetID field, it will have
to be 
> resistant to attack from malicious sources. A new coding would open up
a 
> new set of vulnerabilities that would all have to be dreamt up then 
> patched. For instance, malicious sources could spoof tunnelled packets
to 
> the decap endpoint as if they were from the encap endpoint, but send 
> sequences of fragments that don't fit together correctly, blowing its 
> memory, causing it to hang, or whatever.

The spec gives very specific instructions on how packets are to
be segmented that are actually a lot less flexible than RFC791
IPv4 fragmentation. In particular, all segments except the final
segment must be the same size (such that only a small integer
segment ID and not an offset are necessary) and the final byte
of the ith segment is the one that immediately precedes the first
byte of the i+1th segment. This makes reassembly much easier, and
if an attacker injects a bogus segment it will either be recognized
immediately as a martian or be caught by the trailing checksum
during reassembly. I'm sure this requires a more careful analysis,
however.

Thanks - Fred
[EMAIL PROTECTED]

_______________________________________________
Int-area mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/int-area

RE: fragmentation and tunnels (was: RE: [Int-area] Re: [tsv-area] Fwd: I-DACTION:draft-heffner-frag-harmful-03.txt)

Reply via email to