Hi Bob, > >I have been looking at fragmentation over IP*-in-IPv4 tunnels > >for some time, and 'draft-templin-linkadapt-03.txt' proposes > >an Alternate Fragmentation (AF) scheme that occurs below the > >transport layer segmentation (e.g., Packetization Layer Path > >MTU Discovery) but above IPv4 fragmentation. It supports > >segmentation/reassembly at the tunnel endpoints and uses a > >dynamic segment size probing mechanism within the tunnel to > >avoid in-the-network IPv4 fragmentation, yet remains compatible > >with in-the-network fragmentation should it occur. > > I've scanned it.
OK - thanks. > My initial reaction is that is seems somewhat over-specific in places. It > would seem better if only the bare bones needed for tunnel fragmentation > was in there, rather than tying together the fragmentation scheme with the > checksumming and so forth. The trailing checksum is there to detect packet splicing errors, memory errors, driver bugs, bogus segments, etc. and discard corrupted data that might not otherwise be detected by the Internet checksum. > Also I suspect the references to specific link > technologies will seem dated in a few years Maybe so, but to a certain extent I believe injecting a bit of history here is useful in terms of leveraging past experience moving forward. For example, the ATM cell size is coincidentally similar to the minimum IPv4 MTU of 68 bytes [RFC791], and AAL5 is coincidentally similar to the segmentation/reassembly being proposed in the document. > --- it needs to justify some of > the min & max MTU choices for the long term. Well, for IPv6 we have a MinMTU of 1280 bytes for the long term and I don't expect we will see that change soon. In terms of the MaxMTU, my best-read of Jonathan Stone's thesis on "Checksums in the Internet" was that the incidence of undetected errors for 32b checksums becomes non-negligible for packet sizes larger than about 9KB, which is coincidentally similar to the 9180 byte MTU for IP over ATM. Future links that support even larger packet sizes with strong error checking may allow for still larger tunnel MTUs, but I'm not an expert on packet size interactions with error detection mechanisms so would appreciate if others who know more about it could comment. > Also, it would be useful to > say why existing v6 fragment header mechanisms aren't applicable. IPv6 sees the tunnel as an ordinary link that needs to present an assured MTU of 1280 bytes or larger. If the tunnel has to segment IPv6 packets into smaller chunks in order to fulfill the MTU it advertises to IPv6, then that is a L2 issue with IPv4 being the L2 encapsulation protocol in question. If the encapsulation were to include a full IPv6 header with frag header in every segment, that would be 48 bytes of IPv6 header plus 20 bytes of IPv4 header leaving 0 bytes for ULP data on links that support only the minimum IPv4 MTU of 68 bytes. It would also be inefficient in general even on links with larger-than-68b MTUs, and something of a layer violation. That said, it may be useful to include just the first 4 octets of the IPv6 header in the encapsulation of non-initial segments. That would give the 8b Traffic Class and 20b Flow Label which could be used by the reassembler to further disambiguate flows. > >Use of the > >scheme is indicated by setting the reserved bit in the IPv4 > >header 'Flags' field (to be renamed as the 'AF' bit), thus > >it obsoletes RFC3514 (an "April Fools Day" RFC). > > This seems a rather specific use for this last reserved bit, given things > in the IP header are meant to be generally useful to upper layers. Well, the bit occurs in the flags field associated with fragmentation, so defining it to indicate an Alternate Fragmentation (AF) scheme would seem like a natural fit. > Couldn't use of the coding scheme be negotiated between tunnel endpoints > for all tunnelled packets carrying a specific src-dest address pair? This might work for persistent configured tunnels that are established through some initial negotiation phase, but might not be so practical for automatic tunnels or tunnels that are short-lived and/or only carry a small number of packets. It is also possible that this AF scheme may be used by IPv4 packetization layers other than tunnels and may even supplement or replace RFC791 IPv4 fragmentation. > I have to declare an interest in the reserved (evil) bit myself at this > point. I've been presenting a draft myself in tsvwg > <draft-briscoe-tsvwg-re-ecn-tcp-03.txt> that relies on using > it. However, I like to think we have a strong justification for grabbing it, > as the above draft is the culmination of a decade of work to fix the > resource allocation and accountability problems with the current Internet architecture. I know that RFC3514 had tounge-in-cheek when it used the term "evil", but to be sure we are talking about precious real estate here. I have been working on the tunnel MTU stuff since 2002 when the ngtrans wg closed and re-emerged as v6ops. >From what I can tell one question comes down to whether we can expect to see IPv6-in-IPv4 tunnels deployed on a wide scale and on a long-term basis in the Internet? If so, we will need tunnel MTU assurance that provides robustness and efficiency over arbitrary Internet paths. Another question comes to whether such an AF scheme could supplement or replace RFC791 IPv4 fragmentation (along with its myriad issues that are well documented) even for non-tunnel applications. > Are you aware of any other claims on that bit? I'm sure there > are many. I knew of one proposal several years back, but to my knowledge no I-Ds were ever issued and I don't know of any other claims. Maybe someone else knows of other proposals. > >This AF scheme codes the 'ip_id' field in the IPv4 header and > >thus presents a shorter-than-16b ID (the current draft version > >specifies only a 6b ID). The assumption (which I think is also > >an assumption you are making) is that successful reassembly > >will occur within a very short time window if it will occur > >at all. This assumes that reassembly failure will normally > >occur due to packet loss rather than gross reordering of > >packets within the same flow. > > I'm not quite making that assumption. I'm saying re-assembly > over shorter gaps should take priority over longer gaps, so > re-assembly over longer gaps should be deferred for a while. That's fine, but my intuition is that applications and transports are not very tolerant of grossly reordered packets and some may prefer to receive partial data (e.g. using UDPlite) over data that arrives too late. > >There have been many studies on packet reordering within the > >Internet, but I have not found one yet that can completely > >characterize the *degree* of reordering, i.e., the expected > >number of places by which a reordered packet is out-of-order. > >Most of the studies I have seen seem to suggest that the > >expected degree of reordering of packets within a short chain > >of packets sent in rapid succession within the same flow is > >typically very small, e.g., a reordering event such as > >(1,2,4,5,3,6,...) may occur occasionally, while one such > >as (1,2,4,5,...,64k,3,64k+1,...) most likely will not. Other > >factors to consider are: 1) as you observe, lengthening the > >ID field may be insufficient in the presence of gross > >reordering, and 2) transports such as TCP are likely to > >treat grossly reordered packets as loss anyway. > > I guess it depends when you look. If you catch a re-route in > progress into a shorter path, you might see something more like > your second example if the re-route affects very fast flows. So, > the typical case might not be the only case we have to allow for. Route flaps happen all the time in the environments I care about, and in many cases loss of one or a couple of packets is unavoidable. Reliable transports already take care of this, and mechanisms like UDPlite may be useful for applications such as streaming media. > >Finally, this work has been around for some time now, and > >I believe has been reviewed by many while few have commented. > >Perhaps now is the time for discussion on a wider basis. > > I guess one immediate problem springs to mind. Current re-assembly > implementations have had to be hardened against frag, tear-drop etc > attacks. In defining a new coding of the packetID field, it will have to be > resistant to attack from malicious sources. A new coding would open up a > new set of vulnerabilities that would all have to be dreamt up then > patched. For instance, malicious sources could spoof tunnelled packets to > the decap endpoint as if they were from the encap endpoint, but send > sequences of fragments that don't fit together correctly, blowing its > memory, causing it to hang, or whatever. The spec gives very specific instructions on how packets are to be segmented that are actually a lot less flexible than RFC791 IPv4 fragmentation. In particular, all segments except the final segment must be the same size (such that only a small integer segment ID and not an offset are necessary) and the final byte of the ith segment is the one that immediately precedes the first byte of the i+1th segment. This makes reassembly much easier, and if an attacker injects a bogus segment it will either be recognized immediately as a martian or be caught by the trailing checksum during reassembly. I'm sure this requires a more careful analysis, however. Thanks - Fred [EMAIL PROTECTED] _______________________________________________ Int-area mailing list [email protected] https://www1.ietf.org/mailman/listinfo/int-area
