Sasha Khapyorsky wrote:
On Mon, 2007-03-19 at 13:42 +0200, Yevgeny Kliteynik wrote:
Hi Hal,
[V2 of the patch]
This patch fixes a bug in the lid matrices creation:
The lid matrices were not cleared, which caused OSM routing
to crash when routing nonexisting (disconnected) lids.
Where the crash happens? Could you clarify?
In __osm_ucast_mgr_process_neighbor(), there is the following assertion:
CL_ASSERT( hops <= osm_switch_get_hop_count( p_sw, lid_ho,
port_num ) );
This assertion fails, since the hop count becomes inconsistent.
Anyway I agree that there is no point to try to generate routes to
disconnected lids, so cleanup is ok. Now about the patch itself - I
would prefer to place min hop tables cleanup in
osm_switch_prepare_path_rebuild(), just something like this:
diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c
index 7c36a54..9273459 100644
--- a/osm/opensm/osm_switch.c
+++ b/osm/opensm/osm_switch.c
@@ -516,6 +516,9 @@ osm_switch_prepare_path_rebuild(
for ( i = 0; i < p_sw->num_ports; i++ )
osm_port_prof_construct( &p_sw->p_prof[i] );
+
+ osm_switch_clear_hops(p_sw);
+
if (!p_sw->hops)
{
hops = malloc((max_lids + 1)*sizeof(hops[0]));
No problem.
Please apply to ofed_1_2.
I'm not sure about the trunk though.
Sasha,
Can you please check that you latest improvements to the
routing don't have this problem?
With disconnecting switches should be similar behavior I guess.
Right, I checked it - same problem.
I'll issue new patch.
-- Yevgeny.
Sasha
Thanks.
-- Yevgeny
Signed-off-by: Yevgeny Kliteynik <[EMAIL PROTECTED]>
---
osm/opensm/osm_ucast_mgr.c | 20 ++++++++++++++------
1 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c
index ee6b3f9..8643754 100644
--- a/osm/opensm/osm_ucast_mgr.c
+++ b/osm/opensm/osm_ucast_mgr.c
@@ -1189,6 +1189,7 @@ ucast_mgr_setup_all_switches(osm_subn_t
{
osm_switch_t *p_sw;
uint16_t lids;
+ uint16_t i;
lids = (uint16_t)cl_ptr_vector_get_size(&p_subn->port_lid_tbl);
lids = lids ? lids - 1 : 0;
@@ -1196,12 +1197,19 @@ ucast_mgr_setup_all_switches(osm_subn_t
for (p_sw = (osm_switch_t*)cl_qmap_head(&p_subn->sw_guid_tbl);
p_sw != (osm_switch_t*)cl_qmap_end(&p_subn->sw_guid_tbl);
p_sw = (osm_switch_t*)cl_qmap_next(&p_sw->map_item))
- if (osm_switch_prepare_path_rebuild(p_sw, lids)) {
- osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR,
- "ucast_mgr_setup_all_switches: ERR 3A0B: "
- "cannot setup switch 0x%016" PRIx64 "\n",
- cl_ntoh64(osm_node_get_node_guid(p_sw->p_node)));
- return -1;
+ {
+ if (osm_switch_prepare_path_rebuild(p_sw, lids)) {
+ osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR,
+ "ucast_mgr_setup_all_switches: ERR 3A0B: "
+ "cannot setup switch 0x%016" PRIx64 "\n",
+ cl_ntoh64(osm_node_get_node_guid(p_sw->p_node)));
+ return -1;
+ }
+
+ /* Clear the LID matrix of the switch */
+ for ( i = 0; i < p_sw->num_hops; i++ )
+ if (p_sw->hops[i])
+ memset(p_sw->hops[i], OSM_NO_PATH, p_sw->num_ports);
}
return 0;
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general