Nils Goroll wrote:
Hi Erik and all,

I am sorry for this very late comment, I should have replied during the review phase, but I didn't get to it. It's only been today that I got reminded by the commit notification and read through the PSARC emails and some of the discussion in networking-discuss.

Thanks for your questions and comments.

First of all, two questions on the new source IP selection:

- Will it still be possible to use routing entries to force use of a certain source address by destination? I've seen a couple of installations making use
  of this.

What type of configuration is this?
Let me venture a guess. You have two (or more) IP subnet prefixes assigned to a wire and the router(s) have an IP address in each subnet.
The the host has something like
        bge0 10.1.0.33/24
        bge0:1 10.2.0.33/24
with 10.1.0.1 and 10.2.0.1 being the router's IP address.
Then you could have some statically added routes point at 10.1.0.1 and some point at 10.2.0.1, and that would affect the source address selected. Is that a close to the configuration?

The fact that it worked this was in S10 (and until recently in Nevada) is more or less an accident. In fact, if you configure IPMP you'd find that all the source addresses (on bge0 and the other NICs in the IPMP group) will be selected in a round-robin fashion.

With IP datapath refactoring it behaves the same way whether or not IPMP is configured.

Part of the issue we had to fix was this confusion between routing and source address selection. IP routing selects a nexthop (the IP address of the router) and an outbound (physical) interface. That outbound interface is then used to find a candidate set IP addresses. There aren't RFCs that proscribe this for IPv4 since almost everybody did this based on the BSD source base 20 years ago. For IPv6 it is actually part of the RFC set. And none of that talks about any "logical interfaces" like we have in Solaris.

It would be good to know what external factors drove the configuration in this direction, to see what other approaches we have available. In the case of shared-IP zones things are different since we have the added constraint the the source address be assigned to the zone.

- Will explicit source address selection via in_pktinfo_t sill work? I am
looking after a couple of customer cases descending from a (very) old case regarding source address selection for RPC over UDP with (pre-clearview) IPMP
  and I would very much like to see them get closed one day (greetings to
  everyone involved in those cases, you'll know what I am referring to).

Yes, the IPv4 and IPv6 pktinfo socket options can be used to set the source address. So does the use of bind(), and IP_MULTICAST_IF for IPv4 multicast.

I would like to mention that a long time ago I had to debug weird behavior of a certain firewall HA solution which used multicast MAC addresses for unicast IP addresses in order to achieve load spreading and uninterrupted service by having hosts send all traffic to all nodes of a cluster (the interfaces of nodes were members of the same multicast group and ARP requests for the unicast "cluster IP address" were answered with the MAC address corresponding to that multicast IP
address).

This certainly is quite an exotic case (and an approach which, friendly put, I didn't find particularly clean) and I am not sure if such applications still exist, but it might be relevant to check if the refactored code could handle
this properly.

While I'm not sure it is well-defined what "properly" means here (sending unicast IP packets to a multicast MAC address is very questionable) I'm not aware of any changes we've made to ARP that would reject such things.

IIRC, I've used the now discontinued behavior as a simple and flexible
configuration on GigE networks with jumbo frames supported by *some* hosts only.
By using logical addresses on the server with different MTUs, clients could
select between "default MTU" and "jumbo frame enabled" server addresses, which was really useful for optimizing performance for UDP based applications (I am
generalizing, I only ever used this for NFS/UDP).

From an administrator's standpoint, the advantage of this configuration was that no client IP specific configuration was involved as with setting routes
with -mtu.

So you would have two separate IP subnets on the same wire, and each client would somehow be configured to know that bge0 has a mtu of 9k and bge0:1 has an mtu of 1500?> I am aware that the particular customer installation I am talking about
> here is
> not clean in the sense that, if jumbo frames are used on a broadcast
> domain they
> should be supported by all devices, but sometimes it is hard to convince
> users
> to implement a clear design when what they have is just working for them
> and
> changing things would imply significant additional cost.
>

With those configurations, does broadcast and multicast of large packets actually work?

I would suggest you try
ping -sn 224.0.0.1 8192> I am aware that the particular customer installation I am talking about
> here is
> not clean in the sense that, if jumbo frames are used on a broadcast
> domain they
> should be supported by all devices, but sometimes it is hard to convince
> users
> to implement a clear design when what they have is just working for them
> and
> changing things would imply significant additional cost.
>
        ping -sn ff02::1 8192
        ping -sn -i bge0 255.255.255.255 8192
and make sure that all the systems on the Ethernet respond.

For the above problem of wanting to selectively use jumbo-frames, since you already have separate IP subnet prefixes, why not just run those IP subnets on separate VLANs? That would ensure that multicast and broadcast works as expected.

My questions are:

* Could you please explain why, in the former implementation, it was not well
  defined which MTU was applied for multicast packets?

Try the above pings and let me know how it goes.

The issue that is obvious for multicast and for the 255.255.255.255 broaedcast is that these packets are routed out a physical interface since there is no correlation between the destination IP address and any IP subnet prefix.

But as I mentioned above, the IP architecture that everybody seems to assume for IPv4 and have written down for IPv6 is about routing packets out (physical) interfaces. This can be found deep down in the MIB RFCs that describe how routes are represented. The Solaris notion of logical interfaces doesn't fit into those descriptions of routing.

* Why would it cause any harm to keep an interface MTU for locical interfaces?

It causes confusion, and is by definition incomplete and unworkable as shown by the case of multicast and all-ones broadcast.

  My understanding is that

- for unicast packets, the effective MTU would be the minimum of the MTUs of the locical interface, the physical interface, the destination ire and the
    destionation IP

- for multicast packets, the effective MTU would be the minimum of the MTUs of
    the locical interface and the physical interface

The issue is that packets are routed out physical network interfaces.
Which logical interface does the 224.0.0.1 address belong to?

   Erik
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to