On Thu, Jan 8, 2026 at 2:56 PM Haoyu Song <[email protected]> wrote: > > Hi Tom, > > > > I came across your draft “Scale-Up Network Header (SUNH)” and found it very > interesting and timely. It resonated well with a draft I wrote a while ago > (https://datatracker.ietf.org/doc/html/draft-song-ship-edge-05) although that > one addressed a more general scope. >
Hi Haoyu, Thanks for your comments! > Here are some comments and thoughts I’d like to share. > > > > “Some traffic patterns may have a majority of small packets, like for KV > cache in AI, where packet sizes may commonly be 256 bytes or less.” > > As far as I know, the token KV cache is pretty big and the packet size is > limited by the MTU. The small packets in Scale Up network are usually for > control plane signaling and synchronization, or small memory-semantic > transactions. Anyway, I agree the header overhead is a big concern in AIDCN > and the current SUE solution is flawed. Yes, but the 256 number seems to be the latest "practical" minimum size we need support. At least people aren't talking about getting line rate with sixty-four byte packets like was all the rage a few years ago! :-) > > > > There’s no dedicated Scale Up Network NICs available. Scale Up network > interface is usually supposed to be integrated into the GPU dies. When > Ethernet interface is used, the situation might change. But a GPU cannot > afford to have two PCIe interfaces to connect two separate NICs. So most > likely the scale up and scale out networks will be converged and share the > same NIC. If that’s true, compatibility and the ability to interoperate with > standard IP protocols become a necessity. Yes. A lot of this has to do with the topologies. In our design, GPUs in a single system connect to a memory fabric and the scale-up network would probably be useful for intra rack connectivity. > > > > I think 16-bit SUNH address is too long for now, and probably too short in > the future if the scale up and scale out networks are converged. So it’s > better to maintain flexibility. The SHIP draft I mentioned earlier provides a > flexible scheme and allows the gateway switches to translate the > header-compressed packets into normal IPv4/v6 packets so the inter-DC traffic > can be seamlessly supported. With this, the routing header (i.e., compressed > SRv6) can also be supported, making the scheme even more flexible to support > SR (in another research, I found that capability is very useful in certain > DCN topologies). It's a tradeoff (of course any address size we choose is a tradeoff :-) ). While flexibility is nice, it comes at the expense of complexity. Supporting multiple address lengths in the same protocol is complex on both switches and hosts. There's a lesson to be learned from IPv6. While IPv6 still has an IP version number, the fact that IPv6 has its own EtherType makes the IP version number redundant and unnecessary-- in practical terms IPv6 is a distinct protocol from IPv4, not just a different version. In SUNH we can apply the lesson by eschewing things like version numbers and variable length headers. If sixteen bit addresses prove to be the wrong choice then we can just spin a new protocol with a different address size and its own EtherType. As for SRv6 used with SUNH, I'm personally ambivalent. The design of the protocol allows for SRv6 headers, but I suspect that most use cases like scale-up networking tend to be flat networks so SR might not be very interesting. > > > > I think this is an area and opportunity that IETF can contribute to the AI > network, and I’m looking forward to contributing to it. Great! Tom > > > > Best regards, > > Haoyu > > _______________________________________________ Int-area mailing list -- [email protected] To unsubscribe send an email to [email protected]
