Hal Rosenstock found a way to make torus-2QoS seg fault: when
the fabric contains a torus dimension with radix 4, but the
configuration info in torus-2QoS.conf didn't say so.  This
patch detects the result of such misconfiguration, and warns.

Tested-by: Hal Rosenstock <[email protected]>
Signed-off-by: Jim Schutt <[email protected]>
---
 opensm/opensm/osm_torus.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/opensm/opensm/osm_torus.c b/opensm/opensm/osm_torus.c
index 0b7741d..12b480d 100644
--- a/opensm/opensm/osm_torus.c
+++ b/opensm/opensm/osm_torus.c
@@ -1623,6 +1623,22 @@ bool link_srcsink(struct torus *t, int i, int j, int k)
                return true;
 
        fsw = tsw->tmp;
+       /*
+        * link_srcsink is supposed to get called once for every switch in
+        * the fabric.  At this point every fsw we encounter must have a
+        * non-null osm_switch.  Otherwise something has gone horribly
+        * wrong with topology discovery; the most likely reason is that
+        * the fabric contains a radix-4 torus dimension, but the user gave
+        * a config that didn't say so, breaking all the checking in
+        * safe_x_perpendicular and friends.
+        */
+       if (!(fsw && fsw->osm_switch)) {
+               OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+                       "Error: Invalid topology discovery. "
+                       "Verify torus-2QoS.conf contents.\n");
+               return false;
+       }
+
        pg = &tsw->ptgrp[2 * TORUS_MAX_DIM];
        pg->type = SRCSINK;
        tsw->osm_switch = fsw->osm_switch;
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to