The following document describes the changes to dladm and SMF that
we're proposing for the RBridges project. We'd appreciate any
comments you might have on this plan (either privately or on any of
the open lists) prior to submission for ARC review.
I'm planning to go forward with ARC review in a week or so, and you'll
have a second chance to review it then if you can't do it now.
This project adds basic Ethernet (layer two) bridging support to
OpenSolaris. It consists of a Project Private kernel module and
daemon, some Project Private SMF properties, and Committed dladm and
SMF control interfaces. It is targeted for a Minor release of an
OpenSolaris distribution, though we do not believe that any of the
changes here require Minor binding.
This project assumes that Clearview UV (PSARC 2006/499) will integrate
first. The terminology and command line design reflects that
assumption. In particular, Clearview obsoletes the idea of "network
devices" and instead relies on "links" that may themselves be of
varying types.
The bridging protocol referred to in this document is the IEEE
802.1D-1998 "Spanning Tree Protocol," abbreviated in this document as
"STP." The newer and far more complex "Multiple Spanning Tree
Protocol" (802.1Q-2005; MSTP) is intended to be backward compatible
with STP, and is not part of this project, but may be the subject of a
future project.
This document is large, but we believe that the changes described here
are straightforward and obvious, given the existing system design, and
they've been reviewed by the Clearview and NWAM teams, and thus the
changes are suitable for fast-track treatment. [XXX this cross-team
review hasn't happened yet!]
1. Administration
All of the administration of this feature is based on dladm and
SMF. The SMF portion is the ability to enable and disable bridge
instances using the instance URIs described in section 3 below.
1.1 New dladm subcommands
These commands are patterned after the existing aggregation
commands in dladm.
dladm create-bridge [-t] [-R <root-dir>] [-p <priority>]
[-m <max-age>] [-h <hello-time>] [-d <forward-delay>]
[-f <force-protocol>] [-l <link>]... <bridge-name>
This command creates a bridge instance and optionally assigns
network links to the new bridge. By default, no bridge
instances are present, and OpenSolaris will not bridge between
network links. See the "add-bridge" subcommand for details on
link assignment.
Bridge creation and link assignment require PRIV_SYS_NET_CONFIG.
In order to bridge between links, you must create at least one
bridge instance. Each instance is separate: there is
intentionally no forwarding connection between bridges. (Note
that Crossbow's VNICs may in the future allow virtual inter-
bridge connections.)
The <bridge-name> provided is chosen by the administrator and
arbitrary, but must be a legal SMF service instance name. For
purposes of documentation, this is a URI component without
escape sequences, meaning that the following characters may not
be present:
; / ? : @ & = + $ , % < > # "
including whitespace and ASCII control characters. The name
"default" is reserved, as are all names beginning with the
string "SUNW". Names with trailing digits are not permitted, in
order to allow for creation of "observability devices;" see
section 2 below.
Options are:
-t Create a temporary bridge.
This will create the bridge object on the running
system, but the newly created bridge will not survive
the next reboot.
-R <root-dir>
Specify an alternate root directory.
This allows the configuration of bridge instances in
alternate roots, as with Live Upgrade and with
jumpstart installs. Note that error checking for link
type isn't possible when administering an alternate
root.
-p <priority>
Specify the Bridge Priority.
This sets the STP priority value for determining the
root bridge node in the network. The default value is
(per the specification) 32768, and legal values are 0
(highest priority) to 65535 (lowest priority).
-m <max-age>
Specify the maximum age for configuration information.
This sets the STP Bridge Max Age parameter.
Information older than this (in seconds) is discarded
by all bridges in the network if this node is the root
bridge. It defaults to 20.0 seconds. Legal values
are from 6.0 to 40.0 seconds. (See the "-d
<forward-delay>" parameter for additional
constraints.)
-h <hello-time>
Specify the Bridge Hello Time.
This sets the STP Bridge Hello Time parameter. If
this node is the root node, it sends Configuration
BPDUs at this interval throughout the network. It
defaults to 2.0 seconds. Legal values are from 1.0 to
10.0 seconds. (See the "-d <forward-delay> parameter
for additional constraints.)
-d <forward-delay>
Specify the Bridge Forward Delay.
This sets the STP Bridge Forward Delay parameter.
This timer is used to sequence the link states when a
port is enabled anywhere in the network if this node
is the root bridge. It defaults to 15.0 seconds.
Legal values are from 4.0 to 30.0 seconds.
Bridges must obey the following two constraints:
2 * (forward_delay - 1.0) >= max_age
max_age >= 2 * (hello_time + 1.0)
Any parameter setting that would violate those
constraints will be treated as an error and cause the
command to fail with a diagnostic message.
-f <force-protocol>
Specify the forced maximum supported protocol.
This sets the MSTP maximum supported protocol number.
The default is 3. The current implementation doesn't
support RSTP or MSTP, so this currently has no effect.
However, if the user desires to prevent MSTP from
being used in the future when implemented, the
parameter may be set to 0 (STP only) or 2 (allow
RSTP).
-l <link> Add a link to the newly-created bridge.
This is equivalent to creating the bridge and then
adding one or more links, as with the "add-bridge"
option below, except that if any of the links cannot
be added, then the entire command fails, and the new
bridge itself isn't created.
dladm modify-bridge [-t] [-R <root-dir>] [-p <priority>]
[-m <max-age>] [-h <hello-time>] [-d <forward-delay>]
[-f <force-protocol>] <bridge-name>
This subcommand modifies the operational parameters of a given
bridge instance. All of the options are the same as for the
"create-bridge" subcommand above, except that the "-l" option is
not permitted. To add links to an existing bridge, use the
"add-bridge" subcommand below.
Bridge parameter modification requires PRIV_SYS_NET_CONFIG.
dladm delete-bridge [-t] [-R <root-dir>] <bridge-name>
This subcommand deletes a bridge instance. Unlike the bridge
creation subcommand, which can add links while creating, it does
not have the option to remove links during the deletion process.
The bridge being deleted must not have any attached links. If
it does, then an error is returned and no action is taken.
Bridge deletion requires PRIV_SYS_NET_CONFIG.
The "-t" and "-R" options are the same as for the
"create-bridge" subcommand.
dladm add-bridge [-t] [-R <root-dir>] -l <link> [-l <link>]...
<bridge-name>
This subcommand adds one or more links to a bridge instance. If
multiple links are specified, and adding any one of them results
in an error, then no changes are made to the system and the
command fails.
Link addition to a bridge requires PRIV_SYS_NET_CONFIG.
A link may be a member of at most one bridge. It's an error to
specify that a link belongs to more than one bridge. To move a
link from one bridge instance to another, remove it from the
current bridge before adding it to the new one.
The links assigned to a bridge must not themselves be VLANs or
tunnels. Only links that would be acceptable as part of an
aggregation or links that are aggregations themselves may be
assigned to a bridge. Other link types will result in error
messages, and no action taken. (A future project may provide
bridging over tunnels using GRE, and over PPP using BCP. Those
cases are not part of this project, but nothing this project is
doing will preclude those cases from the future.)
In this initial version, the links must also be Ethernet type.
Bridging is well-defined over a few other media, and there are
some dodgy ways to make it work on still others, but those cases
are subjects for a future release.
When links are added to a bridge, the bridging protocol in use
(STP) will be notified, and the links will behave as though just
created. For STP, this means that the link will be shut down
and then brought back up using the standard protocol.
The options are the same as for the "create-bridge" subcommand.
dladm remove-bridge [-t] [-R <root-dir>] -l <link> [-l <link>]...
<bridge-name>
This subcommand removes one or more links from a bridge
instance. If multiple links are specified, and removing any one
of them would result in an error, then none are removed and the
command fails.
Link removal from a bridge requires PRIV_SYS_NET_CONFIG.
When links are removed from a bridge, the bridging protocol
(STP) is notified, and will likely recalculate a new network
topology, unless those links were unused due to loop-pruning
activity by the bridging protocol.
The options are the same as for the "create-bridge" subcommand.
dladm show-bridge [-p] [-s [-i <interval>]] [<bridge-name>]
This subcommand shows the running status of bridges. When given
a bridge name, it shows the status of that one bridge. If no
bridge name is given, then it shows summary status of all
bridges on the system.
Note the lack of a "-R" option here. It is not possible to list
bridge configuration information in an alternate root, in
keeping with the rest of the dladm user interface. The reason
for this restriction is to allow the data to be represented in
SMF, where "writing" to an alternate root is supported by way of
copying appropriate commands to $ROOT/var/svc/profile/upgrade,
but "reading" is not feasible because the repository on the
alternate root may be incompatible with the running system.
1.2 New dladm Link Properties
"stp"
This is a boolean property. It defaults to "true." When set
to "false," the link will not use Spanning Tree, and will be
placed into forwarding mode at all times. The "false" setting
is appropriate for point-to-point links connected to end
nodes. Only non-VLAN type links have this property.
"forward"
This is a boolean property on all links. It defaults to
"true." When set to "false," the VLAN associated with the
link instance will not forward traffic through the bridge.
Setting the property to "false" is equivalent to removing the
VLAN from the "allowed set" for a traditional bridge.
"default-tag"
This is a numeric property with range 0 to 4094. It defaults
to 1. It defines the default VLAN ID that's assumed for
untagged packets sent to and received from this link. Only
non-VLAN type links have this property.
"stp-priority"
This is a numeric property with range 0 to 255. It defaults
to 128. It corresponds to the STP Port Priority value, which
is used to determine the preferred root port on a bridge by
prepending to the port identifier. Lower numerical values are
higher priority.
"stp-cost"
This is a numeric property with range 1 to 65535; zero is not
allowed. It represents the cost for using the link, and
defaults to (per the standard) 100 for 10Mbps, 19 for 100Mbps,
4 for 1Gbps, and 2 for 10Gbps.
"bridge-port"
This is a read-only numeric property. It shows the port
number for the link as seen by the bridge, and is used in
Spanning Tree messages and network management.
1.3 New Kstats
Each bridge instance will have a set of statistics, named
"bridge:<index>:<bridge-name>:<statistic>", where:
<index>
Arbitrary instance number assigned by the kernel and
not necessarily retained across reboot.
<bridge-name>
Administrator-specified bridge name.
<statistic>
Name of statistic; at least the following:
learn_source Number of sources learned
learn_expire Number of learnt entries expired
learn_size Current count of learnt entries
forward_direct Directly forwarded packet count
forward_unknown Forwarded with unknown destination
forward_mbcast Forwarded multicast/broadcast
Each link instance will also have new kstats, where the
<statistic> names will be:
bridge_sent Packets forwarded to the link by bridging
bridge_rcvd Packets received from the link (and forwarded
elsewhere) by bridging
All of these statistics are considered Volatile for now. The
existence of the statistics will be documented for users, but with
warnings that the names and definitions of the statistics may
change incompatibly. A future case for the overall RBridges
project will elevate these in stability.
2. Packet Observability
Each bridge instance will be assigned an "observability device,"
in a manner similar to the DLPI nodes created for "Clearview: IP
Observability Devices" (PSARC 2006/475). These nodes will appear
under the /dev/bridge/ directory, named by the bridge name plus a
trailing "0".
The observability node is intended for use with snoop and
wireshark. It behaves as a standard Ethernet interface, but does
not permit the transmission of packets. All transmitted packets
are silently dropped.
The user of this node will get a single unmodified copy of every
packet handled by the bridge, similar to a "monitoring" port on a
traditional bridge, and subject to the usual DLPI "promiscuous
mode" rules. The user may also filter on VLAN ID by using the
VLAN PPA hack mechanism: "/dev/bridge/my-bridge1000" selects VLAN
ID 1 on bridge the instance named "my-bridge".
The observability node also forms a Project Private control node
for the kernel, allowing ioctls to a specific bridge instance, and
will be used by the STP daemon and other (future) bridging
protocols.
3. STP Daemon
Each bridge (created via "dladm create-bridge") is represented as
an identically-named SMF instance of svc:/network/bridge. Each
instance runs a copy of /usr/lib/bridged, which implements the
Spanning Tree Protocol (STP). For example, if the user runs:
# dladm create-bridge my-bridge
The system will have an SMF service named:
svc:/network/bridge:my-bridge
and (per section 2 above) an observability node named:
/dev/bridge/my-bridge0
By default, all ports run standard STP. This is done for safety
reasons: a bridge that does not run some form of bridging protocol
(such as STP) can form long-lasting forwarding loops in the
network. Because Ethernet has no hop-count or TTL on packets, any
such loops are fatal to the network.
When the adminstrator knows that a particular port is not
connected to another bridge (for example, a direct point-to-point
connection to a host system), STP can be disabled administratively
for that port. Even if all ports on a bridge have STP disabled,
the STP daemon still runs; this is in case new ports are added,
and because it is responsible for enabling and disabling
forwarding on the ports.
If the SMF service instance for a bridge is disabled, then bridge
forwarding stops on those ports as the STP daemon is stopped. If
the instance is restarted, STP starts from its initial state.
The bridge daemon runs as UID/GID "daemon" with
PRIV_SYS_NET_CONFIG in order to access the raw network devices,
but with most other basic privileges (e.g., PRIV_PROC_FORK and
PRIV_PROC_EXEC) removed.
3. VLANs
In general, administrators will want to have the VLANs they
configure on the system to be forwarded among all the ports on a
bridge instance, so this will be the default for VLANs. When the
administrator invokes Clearview's "dladm create-vlan", and the
underlying link is part of a bridge, that command will also enable
forwarding of the specified VLAN on that bridge link.
If an administrator wants to configure a VLAN on a link but not
allow forwarding to or from other links on the bridge, then he
must take specific action to do so, by disabling forwarding with
"set-linkprop".
Clearview UV provides two mechanisms for the creation of VLANs.
The primary means of configuration is the new "dladm create-vlan"
subcommand, which automatically enables the VLAN for bridging as
described above, if the underlying link is configured as part of a
bridge.
The second mechanism is a legacy feature called the "PPA hack."
This allows a user to create a VLAN simply by opening a DLPI
provider and specifying a VLAN ID number as part of the PPA. In
this case, the user may be doing nothing other than snooping on
that VLAN, so adding the VLAN to the allowed set automatically is
likely not the right answer. Thus, we will default forwarding to
"off" for PPA-hack VLANs. Administrators with legacy PPA hack
VLANs will need to reconfigure to use the new Clearview VLANs to
take full advantage of bridging, and this will be included in the
documentation.
In STP, VLANs are ignored. The bridging protocol computes just
one loop-free topology and uses that. Administrators are required
to configure any "duplicate" links such that when they're
automatically disabled by STP, the configured VLANs are not
disconnected. MSTP is somewhat similar, but allows administrators
to assign each VLAN to a small number of distinct spanning tree
"instances," and allows instances within an identically-configured
"region" to have distinct topologies. In terms of this project,
additional bridge and link properties would be required to enable
MSTP operation.
4. SMF Properties
These parameters are all Project Private. They will not be
documented, and the documented administrative interface will be
the dladm command.
4.1 STP SMF
Property Name Type Default
-------------- ---- -------
config/priority ushort_t 32768
config/max-age ushort_t 5120 (20 seconds)
config/hello-time ushort_t 512 (2 seconds)
config/forward-delay ushort_t 3840 (15 seconds)
config/force-protocol int 3
All of these properties (and their default values and
granularities) are defined by the STP and related standards.
The "force-protocol" parameter is specified to allow for an
upgrade path. Users who do not want to see the use of MSTP when
it is implemented can set this parameter to 0 or 2 (as specified
in IEEE 802.1Q-2004) to select STP or RSTP as the maximum allowed
protocol. In this project, the parameter will have no effect, as
only STP is implemented.
4.2 Datalink SMF
Property Name Type Default
-------------- ---- -------
config/stp boolean true
config/forward boolean true
config/bridge string ""
config/default-tag ushort_t 1
On a Nemo device, legacy device, or aggregation, the link
parameters are used as above. The "default-tag" parameter may be
set to 0 to disable the forwarding of untagged packets to and from
the port.
On a VLAN, "stp" and "default-tag" are ignored. The "forward"
flag enables forwarding for that VLAN, which is equivalent to
putting the VLAN into the "allowed set" for the bridge port.
Setting it to "false" causes the VLAN to be disallowed, which
means that VLAN-based I/O to the underlying link still operates,
but no bridge-based forwarding is done. The "bridge" parameter is
reserved for use with MSTP, where it will select an instance.
5. Alternatives
Alternative designs include having the set of links for a bridge
listed as part of the bridge configuration, and using non-SMF
files for storing configuration.
The former approach would work, and would have the advantage that
during start-up of the STP daemon it would be easy to find the
list of links configured for that instance. That's a benefit over
the proposed design in that we will need to iterate over all links
to get the list needed for a single instance. However, there are
two reasons this approach wasn't chosen:
a. A link may be a member of at most one bridge. This
semantic is easy to enforce with a link property, as
there's just one instance of the property, but is hard to
enforce across multiple bridges. We end up needing to scan
all bridge instances, and configuration transactions become
more complex because two objects need to be changed at one
time.
b. We want to have all configuration parameters for a link to
be stored with the link itself. Having parameters stored
elsewhere in the system means that utilities that
manipulate links or just display system configuration may
end up needing to scan through these other locations in
order to make coherent system changes. (For this project,
we would be forced to change the existing Clearview "dladm
delete-link" functionality so that it scanned the bridge
instances and removed any links found there. Storing the
data with the link instance removes that requirement.)
Using non-SMF files would also work, and we could make use of the
Clearview UV "link IDs" to avoid problems inherent with link
renaming. However, longer term, the Clearview and NWAM teams are
refactoring link configuration into SMF. Having native bridging
designed for OpenSolaris but not actually integrated with its core
administrative mechanisms seems like a poor recipe for the future.
6. Interface Summary
Interface Stability Comments
--------- --------- --------
dladm *-bridge Committed
link properties Committed
kstats Volatile Should be raised later
/dev/bridge/ Committed Observability node
control ioctls Project Private
/usr/lib/bridged Project Private
/network/bridge Committed SMF URI
config/* Project Private SMF properties
bridge module Project Private Kernel bridging module
--
James Carlson, Solaris Networking <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677