331: unicast IP on multicast ARP, logical interface MTU

Erik Nordmark Mon, 16 Nov 2009 01:28:10 -0800

Nils Goroll wrote:

Hi Erik and all,
I am sorry for this very late comment, I should have replied during thereviewphase, but I didn't get to it. It's only been today that I got remindedby the commit notification and read through the PSARC emails and someof the discussion in networking-discuss.


Thanks for your questions and comments.

First of all, two questions on the new source IP selection:
- Will it still be possible to use routing entries to force use of acertainsource address by destination? I've seen a couple of installationsmaking use
  of this.


What type of configuration is this?

Let me venture a guess. You have two (or more) IP subnet prefixesassigned to a wire and the router(s) have an IP address in each subnet.

The the host has something like
        bge0 10.1.0.33/24
        bge0:1 10.2.0.33/24
with 10.1.0.1 and 10.2.0.1 being the router's IP address.

Then you could have some statically added routes point at 10.1.0.1 andsome point at 10.2.0.1, and that would affect the source addressselected. Is that a close to the configuration?

The fact that it worked this was in S10 (and until recently in Nevada)is more or less an accident. In fact, if you configure IPMP you'd findthat all the source addresses (on bge0 and the other NICs in the IPMPgroup) will be selected in a round-robin fashion.

With IP datapath refactoring it behaves the same way whether or not IPMPis configured.

Part of the issue we had to fix was this confusion between routing andsource address selection. IP routing selects a nexthop (the IP addressof the router) and an outbound (physical) interface. That outboundinterface is then used to find a candidate set IP addresses. Therearen't RFCs that proscribe this for IPv4 since almost everybody did thisbased on the BSD source base 20 years ago. For IPv6 it is actually partof the RFC set. And none of that talks about any "logical interfaces"like we have in Solaris.

It would be good to know what external factors drove the configurationin this direction, to see what other approaches we have available. Inthe case of shared-IP zones things are different since we have the addedconstraint the the source address be assigned to the zone.

- Will explicit source address selection via in_pktinfo_t sill work? I am
looking after a couple of customer cases descending from a (very) oldcaseregarding source address selection for RPC over UDP with(pre-clearview) IPMP
  and I would very much like to see them get closed one day (greetings to
  everyone involved in those cases, you'll know what I am referring to).

Yes, the IPv4 and IPv6 pktinfo socket options can be used to set thesource address. So does the use of bind(), and IP_MULTICAST_IF for IPv4multicast.

I would like to mention that a long time ago I had to debug weirdbehavior of acertain firewall HA solution which used multicast MAC addresses forunicast IPaddresses in order to achieve load spreading and uninterrupted serviceby havinghosts send all traffic to all nodes of a cluster (the interfaces ofnodes weremembers of the same multicast group and ARP requests for the unicast"cluster IPaddress" were answered with the MAC address corresponding to thatmulticast IP
address).
This certainly is quite an exotic case (and an approach which, friendlyput, Ididn't find particularly clean) and I am not sure if such applicationsstillexist, but it might be relevant to check if the refactored code couldhandle
this properly.

While I'm not sure it is well-defined what "properly" means here(sending unicast IP packets to a multicast MAC address is veryquestionable) I'm not aware of any changes we've made to ARP that wouldreject such things.

IIRC, I've used the now discontinued behavior as a simple and flexible
configuration on GigE networks with jumbo frames supported by *some*hosts only.
By using logical addresses on the server with different MTUs, clients could
select between "default MTU" and "jumbo frame enabled" server addresses,whichwas really useful for optimizing performance for UDP based applications(I am
generalizing, I only ever used this for NFS/UDP).
From an administrator's standpoint, the advantage of this configurationwasthat no client IP specific configuration was involved as with settingroutes
with -mtu.

So you would have two separate IP subnets on the same wire, and eachclient would somehow be configured to know that bge0 has a mtu of 9k andbge0:1 has an mtu of 1500?> I am aware that the particular customerinstallation I am talking about

> here is
> not clean in the sense that, if jumbo frames are used on a broadcast
> domain they
> should be supported by all devices, but sometimes it is hard to convince
> users
> to implement a clear design when what they have is just working for them
> and
> changing things would imply significant additional cost.
>

With those configurations, does broadcast and multicast of large packetsactually work?


I would suggest you try

ping -sn 224.0.0.1 8192> I am aware that the particular customerinstallation I am talking about

> here is
> not clean in the sense that, if jumbo frames are used on a broadcast
> domain they
> should be supported by all devices, but sometimes it is hard to convince
> users
> to implement a clear design when what they have is just working for them
> and
> changing things would imply significant additional cost.
>
        ping -sn ff02::1 8192
        ping -sn -i bge0 255.255.255.255 8192
and make sure that all the systems on the Ethernet respond.

For the above problem of wanting to selectively use jumbo-frames, sinceyou already have separate IP subnet prefixes, why not just run those IPsubnets on separate VLANs? That would ensure that multicast andbroadcast works as expected.

My questions are:
* Could you please explain why, in the former implementation, it was notwell
  defined which MTU was applied for multicast packets?


Try the above pings and let me know how it goes.

The issue that is obvious for multicast and for the 255.255.255.255broaedcast is that these packets are routed out a physical interfacesince there is no correlation between the destination IP address and anyIP subnet prefix.

But as I mentioned above, the IP architecture that everybody seems toassume for IPv4 and have written down for IPv6 is about routing packetsout (physical) interfaces. This can be found deep down in the MIB RFCsthat describe how routes are represented. The Solaris notion of logicalinterfaces doesn't fit into those descriptions of routing.

* Why would it cause any harm to keep an interface MTU for locicalinterfaces?

It causes confusion, and is by definition incomplete and unworkable asshown by the case of multicast and all-ones broadcast.

  My understanding is that
- for unicast packets, the effective MTU would be the minimum of theMTUs ofthe locical interface, the physical interface, the destination ireand the
    destionation IP
- for multicast packets, the effective MTU would be the minimum of theMTUs of
    the locical interface and the physical interface


The issue is that packets are routed out physical network interfaces.
Which logical interface does the 224.0.0.1 address belong to?

   Erik
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] Late comments&questions on PSARC/2009/331: unicast IP on multicast ARP, logical interface MTU

Reply via email to