Hi Dave,

Thank you for the comments. We are on the same page that a compact header with 
just enough address bits is critical in AI DCN (I would argue this also applies 
to the scale-out networks).

I want to further discuss two points:

1. The variable size address isn't that "scary" actually. We have verified the 
scheme with P4 and it's doable. Once it's realized in switch ASIC, there's no 
performance implications at all. On the other hand, supporting different 
lengths have many advantages: it can scale with the cluster size without any 
waste,  it supports communication between clusters with different sizes, it 
doesn't  need to respin the chips in case the network scale changes, and the 
same standard would be applied to any scenarios as laid out in our paper 
"Adaptive Addresses for Next Generation IP Protocol in Hierarchical 
Networks"(ICNP2020). Of course, there's a tradeoff on how fine the address 
length step should be supported (e.g., 1 bit, 2 bits, 4 bits, or 8 bits). This 
is subject to further study.

2. If we assume the scale up network would take ethernet as the L2 technology, 
it can be envisioned that the scale up and scale out network would eventually 
converge into a single network. Then we would consider that the L3 should also 
have a common standard (strictly speaking, if we only have a separate scale up 
network, we don't need L3 at all, because an L2 fabirc is enough). Thus, the 
variable size address can support a hierarchical network naturally mapping to 
the DCN topology and more important, it allows the seamlessly connecting with 
the Internet which runs IPv4/IPv6 so the inter-DC communication can be 
supported without any modification to the public network. I think this is a 
reason we need an IP-like L3 header which can translate into IPv4/v6. Note the 
SUNH proposal support this already, the only issue is the 16-bit address is an 
overkill to the current cluster size, and the fixed length is not flexible.


Best regards,
Haoyu

-----Original Message-----
From: dave seddon <[email protected]>
Sent: Friday, January 9, 2026 2:03 PM
To: [email protected]
Subject: [Int-area] Regarding the draft: Scale-Up Network Header (SUNH)

G'day Tom and Haoyu,

I'm trying to join the discussion about "draft: Scale-Up Network Header 
(SUNH)", but I just joined the mail list, so I don't know if posting to the 
subject line will do it.  ( Apologies if this breaks threading )

Drafts:
https://datatracker.ietf.org/doc/draft-herbert-sunh/
https://datatracker.ietf.org/doc/html/draft-song-ship-edge-05

It seems like the discussion centers on the address length.

The SUNH "1.1.  Problem statement" is very clear "
8% overhead in a 256 byte packet, and the forty bytes of IPv6 header would be 
about 16% overhead "

Absolutely minimizing overhead makes sense currently, but for how long do we 
expect this to be true?  Tom, since you've been talking to people who run the 
largest AI clusters in the world, you expect this to hold true for the 
foreseeable future.


Tom - I wonder if draft-herbert-sunh would benefit from a small summary, maybe 
with a table, that compares the proposed addressing to other protocols that are 
common within data centers?

For example, comparing protocols by their header, address lengths, and 
"overhead"
- PCIe ( IEEE have paywalls, so it's hard to find a good source.
Maybe this: 
https://www.pearsonhighered.com/assets/samplechapter/0/3/2/1/0321156307.pdf
)
- Infiniband ( addressing scheme found here on page 625
https://hjemmesider.diku.dk/~vinter/CC/Infinibandchap42.pdf )
- Ethernet
- Ethernet with 802.1q ( and qnq )
- IPv4
- IPv6
- SUNH
...

Now that the context is established, explain why 16 bits were chosen for the 
source/destination address.  I guess, but it's not in the document; You were 
considering the number of hosts in the domain.

Nit pick (sorry). "care must be taken to ensure the minimum packet size is 
maintained".  Might help to explain why.

Re section "TCP and UDP in SUNH".  I remember recently Stuart from Apple saying 
something pretty interesting about UDP: "If IP had port numbers, you wouldn't 
really need a UDP header at all."

Multicast?  It might be worth mentioning multicast and explaining why it isn't 
discussed.  e.g. No requirement for this, or it might be considered in the 
future if a need arises.



Haoyu - I really like your draft-song-ship-edge-05 Hierarchical addressing 
stuff:
a)
This reminds me of good old fiber channel addressing, and I suppose the more 
modern Infiniband/RDMA.
b)
The words "variable length" are scary because variability clearly isn't ideal 
for hardware.  I guess when you say "variable length" you don't actually mean 
the addresses would vary dynamically, but that there could be a range of set 
fixed length addressing that could be selected for different deployment 
scenarios?
c)
One core concept of draft-song-ship-edge-05, is that traffic destined for IoT 
devices needs a long, unique address, while the traffic _sourced_ from these 
devices towards the data center can have a much smaller destination address.
I recall Geoff Huston discussing IPv6 at a recent NANGO, where he commented 
that because of the pervasive use of anycast by a relatively small number of 
CDNs, that the Internet might only need a /24 worth of addresses for 99% of all 
traffic.
Other network protocols with asymmetric addresses include:
- PCIe (Requester vs Completer addressing)
- In InfiniBand / RDMA, requests carry full destination addressing (QPN + 
LID/GID + path), while responses omit it and are routed implicitly using the 
established queue-pair and path state, making the addressing directionally 
asymmetric.
- QUIC has explicit directional asymmetry in connection IDs


--
Regards,
Dave Seddon

_______________________________________________
Int-area mailing list -- [email protected] To unsubscribe send an email to 
[email protected]

_______________________________________________
Int-area mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to