The following document describes the changes to dladm and SMF that
we're proposing for the RBridges project.  We'd appreciate any
comments you might have on this plan (either privately or on any of
the open lists) prior to submission for ARC review.

I'm planning to go forward with ARC review in a week or so, and you'll
have a second chance to review it then if you can't do it now.



This project adds basic Ethernet (layer two) bridging support to
OpenSolaris.  It consists of a Project Private kernel module and
daemon, some Project Private SMF properties, and Committed dladm and
SMF control interfaces.  It is targeted for a Minor release of an
OpenSolaris distribution, though we do not believe that any of the
changes here require Minor binding.

This project assumes that Clearview UV (PSARC 2006/499) will integrate
first.  The terminology and command line design reflects that
assumption.  In particular, Clearview obsoletes the idea of "network
devices" and instead relies on "links" that may themselves be of
varying types.

The bridging protocol referred to in this document is the IEEE
802.1D-1998 "Spanning Tree Protocol," abbreviated in this document as
"STP."  The newer and far more complex "Multiple Spanning Tree
Protocol" (802.1Q-2005; MSTP) is intended to be backward compatible
with STP, and is not part of this project, but may be the subject of a
future project.

This document is large, but we believe that the changes described here
are straightforward and obvious, given the existing system design, and
they've been reviewed by the Clearview and NWAM teams, and thus the
changes are suitable for fast-track treatment.  [XXX this cross-team
review hasn't happened yet!]


1.  Administration

    All of the administration of this feature is based on dladm and
    SMF.  The SMF portion is the ability to enable and disable bridge
    instances using the instance URIs described in section 3 below.

1.1 New dladm subcommands

    These commands are patterned after the existing aggregation
    commands in dladm.

    dladm create-bridge [-t] [-R <root-dir>] [-p <priority>]
      [-m <max-age>] [-h <hello-time>] [-d <forward-delay>]
      [-f <force-protocol>] [-l <link>]... <bridge-name>

      This command creates a bridge instance and optionally assigns
      network links to the new bridge.  By default, no bridge
      instances are present, and OpenSolaris will not bridge between
      network links.  See the "add-bridge" subcommand for details on
      link assignment.

      Bridge creation and link assignment require PRIV_SYS_NET_CONFIG.

      In order to bridge between links, you must create at least one
      bridge instance.  Each instance is separate: there is
      intentionally no forwarding connection between bridges.  (Note
      that Crossbow's VNICs may in the future allow virtual inter-
      bridge connections.)

      The <bridge-name> provided is chosen by the administrator and
      arbitrary, but must be a legal SMF service instance name.  For
      purposes of documentation, this is a URI component without
      escape sequences, meaning that the following characters may not
      be present:

        ; / ? : @ & = + $ , % < > # "

      including whitespace and ASCII control characters.  The name
      "default" is reserved, as are all names beginning with the
      string "SUNW".  Names with trailing digits are not permitted, in
      order to allow for creation of "observability devices;" see
      section 2 below.

      Options are:

      -t        Create a temporary bridge.

                This will create the bridge object on the running
                system, but the newly created bridge will not survive
                the next reboot.

      -R <root-dir>
                Specify an alternate root directory.

                This allows the configuration of bridge instances in
                alternate roots, as with Live Upgrade and with
                jumpstart installs.  Note that error checking for link
                type isn't possible when administering an alternate
                root.

      -p <priority>
                Specify the Bridge Priority.

                This sets the STP priority value for determining the
                root bridge node in the network.  The default value is
                (per the specification) 32768, and legal values are 0
                (highest priority) to 65535 (lowest priority).

      -m <max-age>
                Specify the maximum age for configuration information.

                This sets the STP Bridge Max Age parameter.
                Information older than this (in seconds) is discarded
                by all bridges in the network if this node is the root
                bridge.  It defaults to 20.0 seconds.  Legal values
                are from 6.0 to 40.0 seconds.  (See the "-d
                <forward-delay>" parameter for additional
                constraints.)

      -h <hello-time>
                Specify the Bridge Hello Time.

                This sets the STP Bridge Hello Time parameter.  If
                this node is the root node, it sends Configuration
                BPDUs at this interval throughout the network.  It
                defaults to 2.0 seconds.  Legal values are from 1.0 to
                10.0 seconds.  (See the "-d <forward-delay> parameter
                for additional constraints.)

      -d <forward-delay>
                Specify the Bridge Forward Delay.

                This sets the STP Bridge Forward Delay parameter.
                This timer is used to sequence the link states when a
                port is enabled anywhere in the network if this node
                is the root bridge.  It defaults to 15.0 seconds.
                Legal values are from 4.0 to 30.0 seconds.

                Bridges must obey the following two constraints:

                        2 * (forward_delay - 1.0) >= max_age

                        max_age >= 2 * (hello_time + 1.0)

                Any parameter setting that would violate those
                constraints will be treated as an error and cause the
                command to fail with a diagnostic message.

      -f <force-protocol>
                Specify the forced maximum supported protocol.

                This sets the MSTP maximum supported protocol number.
                The default is 3.  The current implementation doesn't
                support RSTP or MSTP, so this currently has no effect.
                However, if the user desires to prevent MSTP from
                being used in the future when implemented, the
                parameter may be set to 0 (STP only) or 2 (allow
                RSTP).

      -l <link> Add a link to the newly-created bridge.

                This is equivalent to creating the bridge and then
                adding one or more links, as with the "add-bridge"
                option below, except that if any of the links cannot
                be added, then the entire command fails, and the new
                bridge itself isn't created.

    dladm modify-bridge [-t] [-R <root-dir>] [-p <priority>]
      [-m <max-age>] [-h <hello-time>] [-d <forward-delay>]
      [-f <force-protocol>] <bridge-name>

      This subcommand modifies the operational parameters of a given
      bridge instance.  All of the options are the same as for the
      "create-bridge" subcommand above, except that the "-l" option is
      not permitted.  To add links to an existing bridge, use the
      "add-bridge" subcommand below.

      Bridge parameter modification requires PRIV_SYS_NET_CONFIG.

    dladm delete-bridge [-t] [-R <root-dir>] <bridge-name>

      This subcommand deletes a bridge instance.  Unlike the bridge
      creation subcommand, which can add links while creating, it does
      not have the option to remove links during the deletion process.
      The bridge being deleted must not have any attached links.  If
      it does, then an error is returned and no action is taken.

      Bridge deletion requires PRIV_SYS_NET_CONFIG.

      The "-t" and "-R" options are the same as for the
      "create-bridge" subcommand.

    dladm add-bridge [-t] [-R <root-dir>] -l <link> [-l <link>]...
      <bridge-name>

      This subcommand adds one or more links to a bridge instance.  If
      multiple links are specified, and adding any one of them results
      in an error, then no changes are made to the system and the
      command fails.

      Link addition to a bridge requires PRIV_SYS_NET_CONFIG.

      A link may be a member of at most one bridge.  It's an error to
      specify that a link belongs to more than one bridge.  To move a
      link from one bridge instance to another, remove it from the
      current bridge before adding it to the new one.

      The links assigned to a bridge must not themselves be VLANs or
      tunnels.  Only links that would be acceptable as part of an
      aggregation or links that are aggregations themselves may be
      assigned to a bridge.  Other link types will result in error
      messages, and no action taken.  (A future project may provide
      bridging over tunnels using GRE, and over PPP using BCP.  Those
      cases are not part of this project, but nothing this project is
      doing will preclude those cases from the future.)

      In this initial version, the links must also be Ethernet type.
      Bridging is well-defined over a few other media, and there are
      some dodgy ways to make it work on still others, but those cases
      are subjects for a future release.

      When links are added to a bridge, the bridging protocol in use
      (STP) will be notified, and the links will behave as though just
      created.  For STP, this means that the link will be shut down
      and then brought back up using the standard protocol.

      The options are the same as for the "create-bridge" subcommand.

    dladm remove-bridge [-t] [-R <root-dir>] -l <link> [-l <link>]...
      <bridge-name>

      This subcommand removes one or more links from a bridge
      instance.  If multiple links are specified, and removing any one
      of them would result in an error, then none are removed and the
      command fails.

      Link removal from a bridge requires PRIV_SYS_NET_CONFIG.

      When links are removed from a bridge, the bridging protocol
      (STP) is notified, and will likely recalculate a new network
      topology, unless those links were unused due to loop-pruning
      activity by the bridging protocol.

      The options are the same as for the "create-bridge" subcommand.

    dladm show-bridge [-p] [-s [-i <interval>]] [<bridge-name>]

      This subcommand shows the running status of bridges.  When given
      a bridge name, it shows the status of that one bridge.  If no
      bridge name is given, then it shows summary status of all
      bridges on the system.

      Note the lack of a "-R" option here.  It is not possible to list
      bridge configuration information in an alternate root, in
      keeping with the rest of the dladm user interface.  The reason
      for this restriction is to allow the data to be represented in
      SMF, where "writing" to an alternate root is supported by way of
      copying appropriate commands to $ROOT/var/svc/profile/upgrade,
      but "reading" is not feasible because the repository on the
      alternate root may be incompatible with the running system.

1.2 New dladm Link Properties

    "stp"

        This is a boolean property.  It defaults to "true."  When set
        to "false," the link will not use Spanning Tree, and will be
        placed into forwarding mode at all times.  The "false" setting
        is appropriate for point-to-point links connected to end
        nodes.  Only non-VLAN type links have this property.

    "forward"

        This is a boolean property on all links.  It defaults to
        "true."  When set to "false," the VLAN associated with the
        link instance will not forward traffic through the bridge.
        Setting the property to "false" is equivalent to removing the
        VLAN from the "allowed set" for a traditional bridge.

    "default-tag"

        This is a numeric property with range 0 to 4094.  It defaults
        to 1.  It defines the default VLAN ID that's assumed for
        untagged packets sent to and received from this link.  Only
        non-VLAN type links have this property.

    "stp-priority"

        This is a numeric property with range 0 to 255.  It defaults
        to 128.  It corresponds to the STP Port Priority value, which
        is used to determine the preferred root port on a bridge by
        prepending to the port identifier.  Lower numerical values are
        higher priority.

    "stp-cost"

        This is a numeric property with range 1 to 65535; zero is not
        allowed.  It represents the cost for using the link, and
        defaults to (per the standard) 100 for 10Mbps, 19 for 100Mbps,
        4 for 1Gbps, and 2 for 10Gbps.

    "bridge-port"

        This is a read-only numeric property.  It shows the port
        number for the link as seen by the bridge, and is used in
        Spanning Tree messages and network management.

1.3 New Kstats

    Each bridge instance will have a set of statistics, named
    "bridge:<index>:<bridge-name>:<statistic>", where:

        <index>
                Arbitrary instance number assigned by the kernel and
                not necessarily retained across reboot.

        <bridge-name>
                Administrator-specified bridge name.

        <statistic>
                Name of statistic; at least the following:

                learn_source    Number of sources learned
                learn_expire    Number of learnt entries expired
                learn_size      Current count of learnt entries
                forward_direct  Directly forwarded packet count
                forward_unknown Forwarded with unknown destination
                forward_mbcast  Forwarded multicast/broadcast

    Each link instance will also have new kstats, where the
    <statistic> names will be:

        bridge_sent     Packets forwarded to the link by bridging
        bridge_rcvd     Packets received from the link (and forwarded
                        elsewhere) by bridging

    All of these statistics are considered Volatile for now.  The
    existence of the statistics will be documented for users, but with
    warnings that the names and definitions of the statistics may
    change incompatibly.  A future case for the overall RBridges
    project will elevate these in stability.


2.  Packet Observability

    Each bridge instance will be assigned an "observability device,"
    in a manner similar to the DLPI nodes created for "Clearview: IP
    Observability Devices" (PSARC 2006/475).  These nodes will appear
    under the /dev/bridge/ directory, named by the bridge name plus a
    trailing "0".

    The observability node is intended for use with snoop and
    wireshark.  It behaves as a standard Ethernet interface, but does
    not permit the transmission of packets.  All transmitted packets
    are silently dropped.

    The user of this node will get a single unmodified copy of every
    packet handled by the bridge, similar to a "monitoring" port on a
    traditional bridge, and subject to the usual DLPI "promiscuous
    mode" rules.  The user may also filter on VLAN ID by using the
    VLAN PPA hack mechanism: "/dev/bridge/my-bridge1000" selects VLAN
    ID 1 on bridge the instance named "my-bridge".

    The observability node also forms a Project Private control node
    for the kernel, allowing ioctls to a specific bridge instance, and
    will be used by the STP daemon and other (future) bridging
    protocols.


3.  STP Daemon

    Each bridge (created via "dladm create-bridge") is represented as
    an identically-named SMF instance of svc:/network/bridge.  Each
    instance runs a copy of /usr/lib/bridged, which implements the
    Spanning Tree Protocol (STP).  For example, if the user runs:

        # dladm create-bridge my-bridge

    The system will have an SMF service named:

        svc:/network/bridge:my-bridge

    and (per section 2 above) an observability node named:

        /dev/bridge/my-bridge0

    By default, all ports run standard STP.  This is done for safety
    reasons: a bridge that does not run some form of bridging protocol
    (such as STP) can form long-lasting forwarding loops in the
    network.  Because Ethernet has no hop-count or TTL on packets, any
    such loops are fatal to the network.

    When the adminstrator knows that a particular port is not
    connected to another bridge (for example, a direct point-to-point
    connection to a host system), STP can be disabled administratively
    for that port.  Even if all ports on a bridge have STP disabled,
    the STP daemon still runs; this is in case new ports are added,
    and because it is responsible for enabling and disabling
    forwarding on the ports.

    If the SMF service instance for a bridge is disabled, then bridge
    forwarding stops on those ports as the STP daemon is stopped.  If
    the instance is restarted, STP starts from its initial state.

    The bridge daemon runs as UID/GID "daemon" with
    PRIV_SYS_NET_CONFIG in order to access the raw network devices,
    but with most other basic privileges (e.g., PRIV_PROC_FORK and
    PRIV_PROC_EXEC) removed.


3.  VLANs

    In general, administrators will want to have the VLANs they
    configure on the system to be forwarded among all the ports on a
    bridge instance, so this will be the default for VLANs.  When the
    administrator invokes Clearview's "dladm create-vlan", and the
    underlying link is part of a bridge, that command will also enable
    forwarding of the specified VLAN on that bridge link.

    If an administrator wants to configure a VLAN on a link but not
    allow forwarding to or from other links on the bridge, then he
    must take specific action to do so, by disabling forwarding with
    "set-linkprop".

    Clearview UV provides two mechanisms for the creation of VLANs.
    The primary means of configuration is the new "dladm create-vlan"
    subcommand, which automatically enables the VLAN for bridging as
    described above, if the underlying link is configured as part of a
    bridge.

    The second mechanism is a legacy feature called the "PPA hack."
    This allows a user to create a VLAN simply by opening a DLPI
    provider and specifying a VLAN ID number as part of the PPA.  In
    this case, the user may be doing nothing other than snooping on
    that VLAN, so adding the VLAN to the allowed set automatically is
    likely not the right answer.  Thus, we will default forwarding to
    "off" for PPA-hack VLANs.  Administrators with legacy PPA hack
    VLANs will need to reconfigure to use the new Clearview VLANs to
    take full advantage of bridging, and this will be included in the
    documentation.

    In STP, VLANs are ignored.  The bridging protocol computes just
    one loop-free topology and uses that.  Administrators are required
    to configure any "duplicate" links such that when they're
    automatically disabled by STP, the configured VLANs are not
    disconnected.  MSTP is somewhat similar, but allows administrators
    to assign each VLAN to a small number of distinct spanning tree
    "instances," and allows instances within an identically-configured
    "region" to have distinct topologies.  In terms of this project,
    additional bridge and link properties would be required to enable
    MSTP operation.


4.  SMF Properties

    These parameters are all Project Private.  They will not be
    documented, and the documented administrative interface will be
    the dladm command.

4.1 STP SMF

    Property Name               Type            Default
    --------------              ----            -------
    config/priority             ushort_t        32768
    config/max-age              ushort_t        5120    (20 seconds)
    config/hello-time           ushort_t        512     (2 seconds)
    config/forward-delay        ushort_t        3840    (15 seconds)
    config/force-protocol       int             3

    All of these properties (and their default values and
    granularities) are defined by the STP and related standards.

    The "force-protocol" parameter is specified to allow for an
    upgrade path.  Users who do not want to see the use of MSTP when
    it is implemented can set this parameter to 0 or 2 (as specified
    in IEEE 802.1Q-2004) to select STP or RSTP as the maximum allowed
    protocol.  In this project, the parameter will have no effect, as
    only STP is implemented.

4.2 Datalink SMF

    Property Name               Type            Default
    --------------              ----            -------
    config/stp                  boolean         true
    config/forward              boolean         true
    config/bridge               string          ""
    config/default-tag          ushort_t        1

    On a Nemo device, legacy device, or aggregation, the link
    parameters are used as above.  The "default-tag" parameter may be
    set to 0 to disable the forwarding of untagged packets to and from
    the port.

    On a VLAN, "stp" and "default-tag" are ignored.  The "forward"
    flag enables forwarding for that VLAN, which is equivalent to
    putting the VLAN into the "allowed set" for the bridge port.
    Setting it to "false" causes the VLAN to be disallowed, which
    means that VLAN-based I/O to the underlying link still operates,
    but no bridge-based forwarding is done.  The "bridge" parameter is
    reserved for use with MSTP, where it will select an instance.


5.  Alternatives

    Alternative designs include having the set of links for a bridge
    listed as part of the bridge configuration, and using non-SMF
    files for storing configuration.

    The former approach would work, and would have the advantage that
    during start-up of the STP daemon it would be easy to find the
    list of links configured for that instance.  That's a benefit over
    the proposed design in that we will need to iterate over all links
    to get the list needed for a single instance.  However, there are
    two reasons this approach wasn't chosen:

        a. A link may be a member of at most one bridge.  This
           semantic is easy to enforce with a link property, as
           there's just one instance of the property, but is hard to
           enforce across multiple bridges.  We end up needing to scan
           all bridge instances, and configuration transactions become
           more complex because two objects need to be changed at one
           time.

        b. We want to have all configuration parameters for a link to
           be stored with the link itself.  Having parameters stored
           elsewhere in the system means that utilities that
           manipulate links or just display system configuration may
           end up needing to scan through these other locations in
           order to make coherent system changes.  (For this project,
           we would be forced to change the existing Clearview "dladm
           delete-link" functionality so that it scanned the bridge
           instances and removed any links found there.  Storing the
           data with the link instance removes that requirement.)

    Using non-SMF files would also work, and we could make use of the
    Clearview UV "link IDs" to avoid problems inherent with link
    renaming.  However, longer term, the Clearview and NWAM teams are
    refactoring link configuration into SMF.  Having native bridging
    designed for OpenSolaris but not actually integrated with its core
    administrative mechanisms seems like a poor recipe for the future.


6.  Interface Summary

    Interface           Stability               Comments
    ---------           ---------               --------
    dladm *-bridge      Committed
    link properties     Committed
    kstats              Volatile                Should be raised later
    /dev/bridge/        Committed               Observability node
    control ioctls      Project Private
    /usr/lib/bridged    Project Private
    /network/bridge     Committed               SMF URI
    config/*            Project Private         SMF properties
    bridge module       Project Private         Kernel bridging module


-- 
James Carlson, Solaris Networking              <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Reply via email to