OpenSM release notes: Update to 3.1.11 Please apply to ofed_1_3 and master
Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]> diff --git a/opensm/doc/opensm_release_notes-3.1.10.txt b/opensm/doc/opensm_release_notes-3.1.10.txt deleted file mode 100644 index 2b6253d..0000000 --- a/opensm/doc/opensm_release_notes-3.1.10.txt +++ /dev/null @@ -1,492 +0,0 @@ - OpenSM Release Notes 3.1.10 - ============================= - -Version: OpenFabrics Enterprise Distribution (OFED) 1.3 -Repo: git://git.openfabrics.org/~ofed_1_3/management.git (release) - git://git.openfabrics.org/~sashak/management.git (development) -Date: February 2008 - -1 Overview ----------- -This document describes the contents of the OpenSM OFED 1.3 release. -OpenSM is an InfiniBand compliant Subnet Manager and Administration, -and runs on top of OpenIB. The OpenSM version for this release -is openib-3.1.10 - -This document includes the following sections: -1 This Overview section (describing new features and software - dependencies) -2 Known Issues And Limitations -3 Unsupported IB compliance statements -4 Major Bug Fixes -5 Main Verification Flows -6 Qualified software stacks and devices - -1.1 Major New Features - -* QoS manager (experimental) - This QoS manager implementation is in accordance with IBA QoS Annex. - Highly configurable QoS Policy is parsed from OpenSM QoS policy file. - Valid QoS parameters will be reported in SA PathRecord and - MultiPathRecord. In addition simple QoS levels per ULPs configuration - is supported too. - -* Performance Manager - When enabled it collects a fabric port counters and able to log it or - to pass to external program via event plugin interface. It handles - counters overflow, supports LID/QP redirection and is able to work - when OpenSM is in master, standby, and inactive states. - -* Dimension Order routing (DOR) algorithm - DOR Unicast routing algorithm - based on the Min Hop algorithm, but - avoids port equalization except for redundant links between the - same two switches. This provides deadlock free routes for hypercubes - when the fabric is cabled as a hypercube and for meshes when cabled - as a mesh (see details in OpenSM man page). - -* Routing improvements - Speedup the current routing algorithms default MinHops, Up/Down and - LASH and lid matrix generation. Fat Tree routing engine is able to work - with not pure fat free topology. - -* Multiple IB routers support - OpenSM now able to keep configurable subnet prefix to router table. - SA will report path to this routers when SA PathRecord was issued with - non-local DGID. - -* Node map - This is possible to name nodes in this config file. Those names will be - used for logging and by QoS configuration. - -* PKey index support - Proper support for PKey index in GSI queries. - -* Incremental LFTs, PKey, SL2VL, and VLarbitration table updates - OpenSM will only fetch those tables in first heavy sweep and then - will maintain this internally. - -* Fast port and switch detector - When port and/or switch was externally reset and it was fast so sweep - doesn't find this device as disconnected OpenSM will detect this by - changed port states and handle accordingly. - -* Duplicated GUIDs/port moving detector - OpenSM will be able to detect port moving during a fabric discovery - and will not report duplicated GUIDs in this case. - -* Multicast rerouting speedup - Now OpenSM will calculate and setup multicast forwarding tables for - all altered multicast groups and not for each one. - -* Event plugin API - OpenSM allows to load dynamically various plugin modules. - -* Many generic improvements - -1.2 Minor New Features: - -* Daemon mode can be activated with -B option. - -* Support multiple scopes for IPoIB multicast groups in partition config. - -* Loopback connection handling - Loopback connection is not interpreted as duplicated GUID anymore. - -* Connect root nodes option for Up/Down routing engine. - When this option is specified Up/Down will create routing paths between - its root nodes. - -* Dump and log filenames changed from osm* to opensm*. - -* Support loopback console - Socket console with only local access. - -* Configurable config directory (the default value is /etc/opensm) and - configurable default values of OpenSM config filenames. - -* Add option for force SDR link speed - Add option to opensm.opts to force link speed. Currently, only forcing - to SDR link speed is supported. This option is not supported as a - command line option. - -* Better packaging - Building and RPM packaging were improved and simplified. - -* Handle "babbling" ports - When a babbling port (port which causes a frequent trap generation) is - detected, OpenSM will disable the port which should terminate the trap - storm. - -1.3 Library API Changes - - None - -1.4 Software Dependencies - -OpenSM depends on the installation of either OFED 1.3, OFED 1.2, OFED 1.1, -OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD -distribution), or Mellanox VAPI stacks. The qualified driver versions -are provided in Table 2, "Qualified IB Stacks". - -Also building of QoS manager policy file parser requires flex, and either -bison or byacc installed. - -1.5 Supported Devices Firmware - -The main task of OpenSM is to initialize InfiniBand devices. The -qualified devices and their corresponding firmware versions -are listed in Table 3. - -2 Known Issues And Limitations ------------------------------- - -* No Service / Key associations: - There is no way to manage Service access by Keys. - -* No SM to SM SMDB synchronization: - Puts the burden of re-registering services, multicast groups, and - inform-info on the client application (or IB access layer core). - -3 Unsupported IB Compliance Statements --------------------------------------- -The following section lists all the IB compliance statements which -OpenSM does not support. Please refer to the IB specification for detailed -information regarding each compliance statement. - -* C14-22 (Authentication): - M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one - SubnSet method. As a work-around, an OpenSM option is provided for - defining the protect bits. - -* C14-67 (Authentication): - On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then - the SM shall generate a SubnGetResp if the M_Key matches, or - silently drop the packet if M_Key does not match. - -* C15-0.1.23.4 (Authentication): - InformInfoRecords shall always be provided with the QPN set to 0, - except for the case of a trusted request, in which case the actual - subscriber QPN shall be returned. - -* o13-17.1.2 (Event-FWD): - If no permission to forward, the subscription should be removed and - no further forwarding should occur. - -* C14-24.1.1.5 and C14-62.1.1.22 (Initialization): - GUIDInfo - SM should enable assigning Port GUIDInfo. - -* C14-44 (Initialization): - If the SM discovers that it is missing an M_Key to update CA/RT/SW, - it should notify the higher level. - -* C14-62.1.1.12 (Initialization): - PortInfo:M_Key - Set the M_Key to a node based random value. - -* C14-62.1.1.13 (Initialization): - PortInfo:P_KeyProtectBits - set according to an optional policy. - -* C14-62.1.1.24 (Initialization): - SwitchInfo:DefaultPort - should be configured for random FDB. - -* C14-62.1.1.32 (Initialization): - RandomForwardingTable should be configured. - -* o15-0.1.12 (Multicast): - If the JoinState is SendOnlyNonMember = 1 (only), then the endport - should join as sender only. - -* o15-0.1.8 (Multicast): - If a request for creating an MCG with fields that cannot be met, - return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass). - -* C15-0.1.8.6 (SA-Query): - Respond to SubnAdmGetTraceTable - this is an optional attribute. - -* C15-0.1.13 Services: - Reject ServiceRecord create, modify or delete if the given - ServiceP_Key does not match the one included in the ServiceGID port - and the port that sent the request. - -* C15-0.1.14 (Services): - Provide means to associate service name and ServiceKeys. - -4 Major Bug Fixes ------------------ - -The following is a list of bugs that were fixed. Note that other less critical -or visible bugs were also fixed. - -* osm_ucast_ftree.c: do load-leveling of non-CN routes - -* osm_ucast_ftree.c: ignore port 0 and loopbacks on switches - -* lash: fix possible segfault in osm_get_lash_sl() - -* osm_ucast_ftree.c: fixing coredump in fat-tree routing - -* osm_sa_slvl_record: fix overflow crash - -* Break multicast rerouting requests processing when heavy sweep is - scheduled. - -* updn: report fallback properly - -* Fix incorrect identification of routing engine used - -* Don't zero base LID when invalid value is received - -* lash: fix wrong allocation size - -* Fixing broken logic in 'process world' part of LinkRecord processing - -* Fix lmc_mask bit order in osm_sa_link_record.c - -* Adding missing comparison by to_lid/from_lid in LinkRecord processing - -* Broken logic when scanning subnet for PIR request - -* No interactive games in daemon mode - -* Fixing memory leak in node description - -* Fix PortInfo update issues for switch port 0 - -* Changed method_mask type in user_mad interface in accordance with - kernel ABI - -* Use umad_get_issm_path() in osm_vendor_set_sm() - -* Report message fix - -* Uninitialized variables usage fix - -* osm_ucast_ftree.c: Possible NULL ptr seg fault - -* osm_mcast_mgr.c: Possible NULL ptr seg fault - -* TrapRepress was failing for mkey != 0 - -* IB_PR_COMPMASK was used in MPR - -* Set hop limit when creating ipoib multicast groups - -* Fix outstanding mad counters tracking on the error paths. - -* Report new ports before handover mastership - -* Fix opvls and neighbormtu when remote port invalid. - -* Bug in coding trying to set vl_arb_high_limit when PortInfo.base_lid - was still zero. - -* Protect SMInfo response against port moving issue. - -5 Main Verification Flows -------------------------- - -OpenSM verification is run using the following activities: -* osmtest - a stand-alone program -* ibmgtsim (IB management simulator) based - a set of flows that - simulate clusters, inject errors and verify OpenSM capability to - respond and bring up the network correctly. -* small cluster regression testing - where the SM is used on back to - back or single switch configurations. The regression includes - multiple OpenSM dedicated tests. -* cluster testing - when we run OpenSM to setup a large cluster, perform - hand-off, reboots and reconnects, verify routing correctness and SA - responsiveness at the ULP level (IPoIB and SDP). - -5.1 osmtest - -osmtest is an automated verification tool used for OpenSM -testing. Its verification flows are described by list below. - -* Inventory File: Obtain and verify all port info, node info, link and path - records parameters. - -* Service Record: - - Register new service - - Register another service (with a lease period) - - Register another service (with service p_key set to zero) - - Get all services by name - - Delete the first service - - Delete the third service - - Added bad flows of get/delete non valid service - - Add / Get same service with different data - - Add / Get / Delete by different component mask values (services - by Name & Key / Name & Data / Name & Id / Id only ) - -* Multicast Member Record: - - Query of existing Groups (IPoIB) - - BAD Join with insufficient comp mask (o15.0.1.3) - - Create given MGID=0 (o15.0.1.4) - - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4) - - Create BAD MGID=0xFA. (o15.0.1.6) - - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6) - - New MGID with invalid join state (o15.0.1.9) - - Retry of existing MGID - See JoinState update (o15.0.1.11) - - BAD RATE when connecting to existing MGID (o15.0.1.13) - - Partial JoinState delete request - removing FullMember (o15.0.1.14) - - Full Delete of a group (o15.0.1.14) - - Verify Delete by trying to Join deleted group (o15.0.1.14) - - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15) - -* GUIDInfo Record: - - All GUIDInfoRecords in subnet are obtained - -* MultiPathRecord: - - Perform some compliant and noncompliant MultiPathRecord requests - - Validation is via status in responses and IB analyzer - -* PKeyTableRecord: - - Perform some compliant and noncompliant PKeyTableRecord queries - - Validation is via status in responses and IB analyzer - -* LinearForwardingTableRecord: - - Perform some compliant and noncompliant LinearForwardingTableRecord queries - - Validation is via status in responses and IB analyzer - -* Event Forwarding: Register for trap forwarding using reports - - Send a trap and wait for report - - Unregister non-existing - -* Trap 64/65 Flow: Register to Trap 64-65, create traps (by - disconnecting/connecting ports) and wait for report, then unregister. - -* Stress Test: send PortInfoRecord queries, both single and RMPP and - check for the rate of responses as well as their validity. - - -5.2 IB Management Simulator OpenSM Test Flows: - -The simulator provides ability to simulate the SM handling of virtual -topologies that are not limited to actual lab equipment availability. -OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily -regressions use smaller (16 and 128 nodes clusters). - -The following test flows are run on the IB management simulator: - -* Stability: - Up to 12 links from the fabric are randomly selected to drop packets - at drop rates up to 90%. The SM is required to succeed in bringing the - fabric up. The resulting routing is verified to be correct as well. - -* LID Manager: - Using LMC = 2 the fabric is initialized with LIDs. Faults such as - zero LID, Duplicated LID, non-aligned (to LMC) LIDs are - randomly assigned to various nodes and other errors are randomly - output to the guid2lid cache file. The SM sweep is run 5 times and - after each iteration a complete verification is made to ensure that all - LIDs that could possibly be maintained are kept, as well as that all nodes - were assigned a legal LID range. - -* Multicast Routing: - Nodes randomly join the 0xc000 group and eventually the - resulting routing is verified for completeness and adherence to - Up/Down routing rules. - -* osmtest: - The complete osmtest flow as described in the previous table is run on - the simulated fabrics. - -* Stress Test: - This flow merges fabric, LID and stability issues with continuous - PathRecord, ServiceRecord and Multicast Join/Leave activity to - stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get - were added to the test such both existing and non existing nodes - perform them in random order. - -5.3 OpenSM Regression - -Using a back-to-back or single switch connection, the following set of -tests is run nightly on the stacks described in table 2. The included -tests are: - -* Stress Testing: Flood the SA with queries from multiple channel - adapters to check the robustness of the entire stack up to the SA. - -* Dynamic Changes: Dynamic Topology changes, through randomly - dropping SMP packets, used to test OpenSM adaptation to an unstable - network & verify DB correctness. - -* Trap Injection: This flow injects traps to the SM and verifies that it - handles them gracefully. - -* SA Query Test: This test exhaustively checks the SA responses to all - possible single component mask. To do that the test examines the - entire set of records the SA can provide, classifies them by their - field values and then selects every field (using component mask and a - value) and verifies that the response matches the expected set of records. - A random selection using multiple component mask bits is also performed. - -5.4 Cluster testing: - -Cluster testing is usually run before a distribution release. It -involves real hardware setups of 16 to 32 nodes (or more if a beta site -is available). Each test is validated by running all-to-all ping through the IB -interface. The test procedure includes: - -* Cluster bringup - -* Hand-off between 2 or 3 SM's while performing: - - Node reboots - - Switch power cycles (disconnecting the SM's) - -* Unresponsive port detection and recovery - -* osmtest from multiple nodes - -* Trap injection and recovery - - -6 Qualification ----------------- - -Table 2 - Qualified IB Stacks -============================= - -Stack | Version ------------------------------------------|-------------------------- -OFED | 1.3 -OFED | 1.2 -OFED | 1.1 -OFED | 1.0 -OpenIB Gen2 (IBG2 distribution) | 1.0 -OpenIB Gen1 (IBGD distribution) | 1.8.0 -VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later - -Table 3 - Qualified Devices and Corresponding Firmware -====================================================== - -Mellanox -Device | FW versions -------------------------------------|------------------------------- -InfiniScale | fw-43132 5.2.000 (and later) -InfiniScale III | fw-47396 0.5.000 (and later) -InfiniHost | fw-23108 3.5.000 (and later) -InfiniHost III Lx | fw-25204 1.2.000 (and later) -InfiniHost III Ex (InfiniHost Mode) | fw-25208 4.8.200 (and later) -InfiniHost III Ex (MemFree Mode) | fw-25218 5.3.000 (and later) -ConnectX IB | fw-25408 2.3.000 (and later) - -QLogic/PathScale -Device | Note ---------|----------------------------------------------------------- -iPath | QHT6040 (PathScale InfiniPath HT-460) -iPath | QHT6140 (PathScale InfiniPath HT-465) -iPath | QLE6140 (PathScale InfiniPath PE-880) -iPath | QLE7240 -iPath | QLE7280 - -Note 1: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose -QP0 and QP1. However, it does support it as a device on the subnet. - -Note 2: QoS firmware and Mellanox devices - -HCAs: QoS supported by ConnectX. The current FW release -doesn't support QoS. QoS-enabled FW release (2_5_000) is -planned for May. If someone wishes to get QoS-enabled FW -before the official release, they should contact Mellanox FAE. - -Switches: QoS supported by InfiniScale III -Any InfiniScale III FW that is supported by OpenSM supports QoS. diff --git a/opensm/doc/opensm_release_notes-3.1.11.txt b/opensm/doc/opensm_release_notes-3.1.11.txt new file mode 100644 index 0000000..c7bbb41 --- /dev/null +++ b/opensm/doc/opensm_release_notes-3.1.11.txt @@ -0,0 +1,492 @@ + OpenSM Release Notes 3.1.11 + ============================= + +Version: OpenFabrics Enterprise Distribution (OFED) 1.3 +Repo: git://git.openfabrics.org/~ofed_1_3/management.git (release) + git://git.openfabrics.org/~sashak/management.git (development) +Date: May 2008 + +1 Overview +---------- +This document describes the contents of the OpenSM OFED 1.3 release. +OpenSM is an InfiniBand compliant Subnet Manager and Administration, +and runs on top of OpenIB. The OpenSM version for this release +is openib-3.1.11 + +This document includes the following sections: +1 This Overview section (describing new features and software + dependencies) +2 Known Issues And Limitations +3 Unsupported IB compliance statements +4 Major Bug Fixes +5 Main Verification Flows +6 Qualified software stacks and devices + +1.1 Major New Features + +* QoS manager (experimental) + This QoS manager implementation is in accordance with IBA QoS Annex. + Highly configurable QoS Policy is parsed from OpenSM QoS policy file. + Valid QoS parameters will be reported in SA PathRecord and + MultiPathRecord. In addition simple QoS levels per ULPs configuration + is supported too. + +* Performance Manager + When enabled it collects a fabric port counters and able to log it or + to pass to external program via event plugin interface. It handles + counters overflow, supports LID/QP redirection and is able to work + when OpenSM is in master, standby, and inactive states. + +* Dimension Order routing (DOR) algorithm + DOR Unicast routing algorithm - based on the Min Hop algorithm, but + avoids port equalization except for redundant links between the + same two switches. This provides deadlock free routes for hypercubes + when the fabric is cabled as a hypercube and for meshes when cabled + as a mesh (see details in OpenSM man page). + +* Routing improvements + Speedup the current routing algorithms default MinHops, Up/Down and + LASH and lid matrix generation. Fat Tree routing engine is able to work + with not pure fat free topology. + +* Multiple IB routers support + OpenSM now able to keep configurable subnet prefix to router table. + SA will report path to this routers when SA PathRecord was issued with + non-local DGID. + +* Node map + This is possible to name nodes in this config file. Those names will be + used for logging and by QoS configuration. + +* PKey index support + Proper support for PKey index in GSI queries. + +* Incremental LFTs, PKey, SL2VL, and VLarbitration table updates + OpenSM will only fetch those tables in first heavy sweep and then + will maintain this internally. + +* Fast port and switch detector + When port and/or switch was externally reset and it was fast so sweep + doesn't find this device as disconnected OpenSM will detect this by + changed port states and handle accordingly. + +* Duplicated GUIDs/port moving detector + OpenSM will be able to detect port moving during a fabric discovery + and will not report duplicated GUIDs in this case. + +* Multicast rerouting speedup + Now OpenSM will calculate and setup multicast forwarding tables for + all altered multicast groups and not for each one. + +* Event plugin API + OpenSM allows to load dynamically various plugin modules. + +* Many generic improvements + +1.2 Minor New Features: + +* Daemon mode can be activated with -B option. + +* Support multiple scopes for IPoIB multicast groups in partition config. + +* Loopback connection handling + Loopback connection is not interpreted as duplicated GUID anymore. + +* Connect root nodes option for Up/Down routing engine. + When this option is specified Up/Down will create routing paths between + its root nodes. + +* Dump and log filenames changed from osm* to opensm*. + +* Support loopback console + Socket console with only local access. + +* Configurable config directory (the default value is /etc/opensm) and + configurable default values of OpenSM config filenames. + +* Add option for force SDR link speed + Add option to opensm.opts to force link speed. Currently, only forcing + to SDR link speed is supported. This option is not supported as a + command line option. + +* Better packaging + Building and RPM packaging were improved and simplified. + +* Handle "babbling" ports + When a babbling port (port which causes a frequent trap generation) is + detected, OpenSM will disable the port which should terminate the trap + storm. + +1.3 Library API Changes + + None + +1.4 Software Dependencies + +OpenSM depends on the installation of either OFED 1.3, OFED 1.2, OFED 1.1, +OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD +distribution), or Mellanox VAPI stacks. The qualified driver versions +are provided in Table 2, "Qualified IB Stacks". + +Also building of QoS manager policy file parser requires flex and bison +installed. + +1.5 Supported Devices Firmware + +The main task of OpenSM is to initialize InfiniBand devices. The +qualified devices and their corresponding firmware versions +are listed in Table 3. + +2 Known Issues And Limitations +------------------------------ + +* No Service / Key associations: + There is no way to manage Service access by Keys. + +* No SM to SM SMDB synchronization: + Puts the burden of re-registering services, multicast groups, and + inform-info on the client application (or IB access layer core). + +3 Unsupported IB Compliance Statements +-------------------------------------- +The following section lists all the IB compliance statements which +OpenSM does not support. Please refer to the IB specification for detailed +information regarding each compliance statement. + +* C14-22 (Authentication): + M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one + SubnSet method. As a work-around, an OpenSM option is provided for + defining the protect bits. + +* C14-67 (Authentication): + On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then + the SM shall generate a SubnGetResp if the M_Key matches, or + silently drop the packet if M_Key does not match. + +* C15-0.1.23.4 (Authentication): + InformInfoRecords shall always be provided with the QPN set to 0, + except for the case of a trusted request, in which case the actual + subscriber QPN shall be returned. + +* o13-17.1.2 (Event-FWD): + If no permission to forward, the subscription should be removed and + no further forwarding should occur. + +* C14-24.1.1.5 and C14-62.1.1.22 (Initialization): + GUIDInfo - SM should enable assigning Port GUIDInfo. + +* C14-44 (Initialization): + If the SM discovers that it is missing an M_Key to update CA/RT/SW, + it should notify the higher level. + +* C14-62.1.1.12 (Initialization): + PortInfo:M_Key - Set the M_Key to a node based random value. + +* C14-62.1.1.13 (Initialization): + PortInfo:P_KeyProtectBits - set according to an optional policy. + +* C14-62.1.1.24 (Initialization): + SwitchInfo:DefaultPort - should be configured for random FDB. + +* C14-62.1.1.32 (Initialization): + RandomForwardingTable should be configured. + +* o15-0.1.12 (Multicast): + If the JoinState is SendOnlyNonMember = 1 (only), then the endport + should join as sender only. + +* o15-0.1.8 (Multicast): + If a request for creating an MCG with fields that cannot be met, + return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass). + +* C15-0.1.8.6 (SA-Query): + Respond to SubnAdmGetTraceTable - this is an optional attribute. + +* C15-0.1.13 Services: + Reject ServiceRecord create, modify or delete if the given + ServiceP_Key does not match the one included in the ServiceGID port + and the port that sent the request. + +* C15-0.1.14 (Services): + Provide means to associate service name and ServiceKeys. + +4 Major Bug Fixes +----------------- + +The following is a list of bugs that were fixed. Note that other less critical +or visible bugs were also fixed. + +* osm_ucast_ftree.c: do load-leveling of non-CN routes + +* osm_ucast_ftree.c: ignore port 0 and loopbacks on switches + +* lash: fix possible segfault in osm_get_lash_sl() + +* osm_ucast_ftree.c: fixing coredump in fat-tree routing + +* osm_sa_slvl_record: fix overflow crash + +* Break multicast rerouting requests processing when heavy sweep is + scheduled. + +* updn: report fallback properly + +* Fix incorrect identification of routing engine used + +* Don't zero base LID when invalid value is received + +* lash: fix wrong allocation size + +* Fixing broken logic in 'process world' part of LinkRecord processing + +* Fix lmc_mask bit order in osm_sa_link_record.c + +* Adding missing comparison by to_lid/from_lid in LinkRecord processing + +* Broken logic when scanning subnet for PIR request + +* No interactive games in daemon mode + +* Fixing memory leak in node description + +* Fix PortInfo update issues for switch port 0 + +* Changed method_mask type in user_mad interface in accordance with + kernel ABI + +* Use umad_get_issm_path() in osm_vendor_set_sm() + +* Report message fix + +* Uninitialized variables usage fix + +* osm_ucast_ftree.c: Possible NULL ptr seg fault + +* osm_mcast_mgr.c: Possible NULL ptr seg fault + +* TrapRepress was failing for mkey != 0 + +* IB_PR_COMPMASK was used in MPR + +* Set hop limit when creating ipoib multicast groups + +* Fix outstanding mad counters tracking on the error paths. + +* Report new ports before handover mastership + +* Fix opvls and neighbormtu when remote port invalid. + +* Bug in coding trying to set vl_arb_high_limit when PortInfo.base_lid + was still zero. + +* Protect SMInfo response against port moving issue. + +5 Main Verification Flows +------------------------- + +OpenSM verification is run using the following activities: +* osmtest - a stand-alone program +* ibmgtsim (IB management simulator) based - a set of flows that + simulate clusters, inject errors and verify OpenSM capability to + respond and bring up the network correctly. +* small cluster regression testing - where the SM is used on back to + back or single switch configurations. The regression includes + multiple OpenSM dedicated tests. +* cluster testing - when we run OpenSM to setup a large cluster, perform + hand-off, reboots and reconnects, verify routing correctness and SA + responsiveness at the ULP level (IPoIB and SDP). + +5.1 osmtest + +osmtest is an automated verification tool used for OpenSM +testing. Its verification flows are described by list below. + +* Inventory File: Obtain and verify all port info, node info, link and path + records parameters. + +* Service Record: + - Register new service + - Register another service (with a lease period) + - Register another service (with service p_key set to zero) + - Get all services by name + - Delete the first service + - Delete the third service + - Added bad flows of get/delete non valid service + - Add / Get same service with different data + - Add / Get / Delete by different component mask values (services + by Name & Key / Name & Data / Name & Id / Id only ) + +* Multicast Member Record: + - Query of existing Groups (IPoIB) + - BAD Join with insufficient comp mask (o15.0.1.3) + - Create given MGID=0 (o15.0.1.4) + - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4) + - Create BAD MGID=0xFA. (o15.0.1.6) + - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6) + - New MGID with invalid join state (o15.0.1.9) + - Retry of existing MGID - See JoinState update (o15.0.1.11) + - BAD RATE when connecting to existing MGID (o15.0.1.13) + - Partial JoinState delete request - removing FullMember (o15.0.1.14) + - Full Delete of a group (o15.0.1.14) + - Verify Delete by trying to Join deleted group (o15.0.1.14) + - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15) + +* GUIDInfo Record: + - All GUIDInfoRecords in subnet are obtained + +* MultiPathRecord: + - Perform some compliant and noncompliant MultiPathRecord requests + - Validation is via status in responses and IB analyzer + +* PKeyTableRecord: + - Perform some compliant and noncompliant PKeyTableRecord queries + - Validation is via status in responses and IB analyzer + +* LinearForwardingTableRecord: + - Perform some compliant and noncompliant LinearForwardingTableRecord queries + - Validation is via status in responses and IB analyzer + +* Event Forwarding: Register for trap forwarding using reports + - Send a trap and wait for report + - Unregister non-existing + +* Trap 64/65 Flow: Register to Trap 64-65, create traps (by + disconnecting/connecting ports) and wait for report, then unregister. + +* Stress Test: send PortInfoRecord queries, both single and RMPP and + check for the rate of responses as well as their validity. + + +5.2 IB Management Simulator OpenSM Test Flows: + +The simulator provides ability to simulate the SM handling of virtual +topologies that are not limited to actual lab equipment availability. +OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily +regressions use smaller (16 and 128 nodes clusters). + +The following test flows are run on the IB management simulator: + +* Stability: + Up to 12 links from the fabric are randomly selected to drop packets + at drop rates up to 90%. The SM is required to succeed in bringing the + fabric up. The resulting routing is verified to be correct as well. + +* LID Manager: + Using LMC = 2 the fabric is initialized with LIDs. Faults such as + zero LID, Duplicated LID, non-aligned (to LMC) LIDs are + randomly assigned to various nodes and other errors are randomly + output to the guid2lid cache file. The SM sweep is run 5 times and + after each iteration a complete verification is made to ensure that all + LIDs that could possibly be maintained are kept, as well as that all nodes + were assigned a legal LID range. + +* Multicast Routing: + Nodes randomly join the 0xc000 group and eventually the + resulting routing is verified for completeness and adherence to + Up/Down routing rules. + +* osmtest: + The complete osmtest flow as described in the previous table is run on + the simulated fabrics. + +* Stress Test: + This flow merges fabric, LID and stability issues with continuous + PathRecord, ServiceRecord and Multicast Join/Leave activity to + stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get + were added to the test such both existing and non existing nodes + perform them in random order. + +5.3 OpenSM Regression + +Using a back-to-back or single switch connection, the following set of +tests is run nightly on the stacks described in table 2. The included +tests are: + +* Stress Testing: Flood the SA with queries from multiple channel + adapters to check the robustness of the entire stack up to the SA. + +* Dynamic Changes: Dynamic Topology changes, through randomly + dropping SMP packets, used to test OpenSM adaptation to an unstable + network & verify DB correctness. + +* Trap Injection: This flow injects traps to the SM and verifies that it + handles them gracefully. + +* SA Query Test: This test exhaustively checks the SA responses to all + possible single component mask. To do that the test examines the + entire set of records the SA can provide, classifies them by their + field values and then selects every field (using component mask and a + value) and verifies that the response matches the expected set of records. + A random selection using multiple component mask bits is also performed. + +5.4 Cluster testing: + +Cluster testing is usually run before a distribution release. It +involves real hardware setups of 16 to 32 nodes (or more if a beta site +is available). Each test is validated by running all-to-all ping through the IB +interface. The test procedure includes: + +* Cluster bringup + +* Hand-off between 2 or 3 SM's while performing: + - Node reboots + - Switch power cycles (disconnecting the SM's) + +* Unresponsive port detection and recovery + +* osmtest from multiple nodes + +* Trap injection and recovery + + +6 Qualification +---------------- + +Table 2 - Qualified IB Stacks +============================= + +Stack | Version +-----------------------------------------|-------------------------- +OFED | 1.3 +OFED | 1.2 +OFED | 1.1 +OFED | 1.0 +OpenIB Gen2 (IBG2 distribution) | 1.0 +OpenIB Gen1 (IBGD distribution) | 1.8.0 +VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later + +Table 3 - Qualified Devices and Corresponding Firmware +====================================================== + +Mellanox +Device | FW versions +------------------------------------|------------------------------- +InfiniScale | fw-43132 5.2.000 (and later) +InfiniScale III | fw-47396 0.5.000 (and later) +InfiniHost | fw-23108 3.5.000 (and later) +InfiniHost III Lx | fw-25204 1.2.000 (and later) +InfiniHost III Ex (InfiniHost Mode) | fw-25208 4.8.200 (and later) +InfiniHost III Ex (MemFree Mode) | fw-25218 5.3.000 (and later) +ConnectX IB | fw-25408 2.3.000 (and later) + +QLogic/PathScale +Device | Note +--------|----------------------------------------------------------- +iPath | QHT6040 (PathScale InfiniPath HT-460) +iPath | QHT6140 (PathScale InfiniPath HT-465) +iPath | QLE6140 (PathScale InfiniPath PE-880) +iPath | QLE7240 +iPath | QLE7280 + +Note 1: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose +QP0 and QP1. However, it does support it as a device on the subnet. + +Note 2: QoS firmware and Mellanox devices + +HCAs: QoS supported by ConnectX. The current FW release +doesn't support QoS. QoS-enabled FW release (2_5_000) is +planned for May. If someone wishes to get QoS-enabled FW +before the official release, they should contact Mellanox FAE. + +Switches: QoS supported by InfiniScale III +Any InfiniScale III FW that is supported by OpenSM supports QoS. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
