Hi Sasha,
Sasha Khapyorsky wrote:
On 14:41 Tue 17 Feb , Yevgeny Kliteynik wrote:
This patch fixes bugzilla issue #1515:
Topology:
|---------------|
| SW2 |
|---------------|
|x |y |z |v
|----| | | |----|
| | | |
| |----| |----| |
| | | |
a| b| c| d|
|---------------| |---------------|
| SW1 | | SW3 |
|---------------| |---------------|
| |
| |
HCA with SM HCA
During the discovery:
SM sends NodeInfo request to SW1
SM sends NodeInfo request to SW2 through link a->x
SM discovers new node SW2:
- updates DR to SW2 to go through link a->x
- creates physp x
And requests SwitchInfo from SW2, and on response sends PortInfo to all
switch ports. PortInfo receiver will initialize all switch ports. Isn't
it?
Links are created only by getting NodeInfo response. W/o the
fix, when SW1 gets NodeInfo from SW2 through link b->y, it
doesn't initialize physp for y, hence the link can't be created.
So the only chance for the link to be created is when
SW2 will send NodeInfo request to SW1 through link y->b.
But this isn't happening, because DR for SW2 is updated
to contain this link, so SM doesn't probe the remote side
of y to avoid loop.
Ok, so whole story should be caused by race between SW2 SwitchInfo
receiving (using a->x) and SW2 NodeInfo (using b->y). As far as I can
see only in this case SW2 port 0 path will be altered (and PortInfo will
be requested using new path). Right?
Right.
BTW, thing happens with every other link that connects
same nodes. In the example above, link v<->d will be
missing as well.
Hmm, I was not able to reproduce this using two switch setup. But if it
is resulted by race it also should not be 100% reproducible.
Right again. Discovery shouldn't rely on the order of packets
that it receives. I guess that on real hardware the packets
are handled serially, so we need some more complex example
for higher probability of this race.
I see the problem on the simple example using the simulator
(ibmgtsim), which has several threads handling the packets,
so the chances for OOO packets are much higher.
-- Yevgeny
Basically I'm not against proposed physp initialization, but want to
understand the problem better.
Sasha
-- Yevgeny
Sasha
SM sends NodeInfo request to SW2 through link b->y
SM discovers a known node SW2
- DOES NOT create physp y
- updates DR to SW2 to go through link b->y
From now on, the DR to SW2 is going through port y, so OpenSM won't deal
with
port y any more, leaving it uninitialized (no physp object for this
port).
The fix is to create physp for the newly discovered port of the known
switch node, same way as it is done for HCAs.
I also added one log message for the case that showed the problem - when
one of the link sides is uninitialized (no valid ports check). Perhaps
this log message should be an error message instead?
Signed-off-by: Yevgeny Kliteynik <[email protected]>
---
opensm/opensm/osm_node_info_rcv.c | 24 +++++++++++++++++++++++-
1 files changed, 23 insertions(+), 1 deletions(-)
diff --git a/opensm/opensm/osm_node_info_rcv.c
b/opensm/opensm/osm_node_info_rcv.c
index c52c0d5..7da3103 100644
--- a/opensm/opensm/osm_node_info_rcv.c
+++ b/opensm/opensm/osm_node_info_rcv.c
@@ -164,8 +164,12 @@ __osm_ni_rcv_set_links(IN osm_sm_t * sm,
*/
if (!osm_node_link_has_valid_ports(p_node, port_num,
p_neighbor_node,
- p_ni_context->port_num))
+ p_ni_context->port_num)) {
+ OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
+ "Link at node 0x%" PRIx64 ", port %u - no valid
ports\n",
+ cl_ntoh64(osm_node_get_node_guid(p_node)), port_num);
goto _exit;
+ }
if (osm_node_link_exists(p_node, port_num,
p_neighbor_node, p_ni_context->port_num)) {
@@ -537,8 +541,26 @@ __osm_ni_rcv_process_existing_switch(IN osm_sm_t *
sm,
IN osm_node_t * const p_node,
IN const osm_madw_t * const p_madw)
{
+
+ ib_smp_t *p_smp;
+ ib_node_info_t *p_ni;
+ uint8_t port_num;
+
OSM_LOG_ENTER(sm->p_log);
+ p_smp = osm_madw_get_smp_ptr(p_madw);
+ p_ni = (ib_node_info_t *) ib_smp_get_payload_ptr(p_smp);
+ port_num = ib_node_info_get_local_port_num(p_ni);
+
+ if (!osm_node_get_physp_ptr(p_node, port_num)) {
+ OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
+ "Creating physp for node GUID:0x%"
+ PRIx64 ", port %u\n",
+ cl_ntoh64(osm_node_get_node_guid(p_node)),
+ port_num);
+ osm_node_init_physp(p_node, p_madw);
+ }
+
/*
If this switch has already been probed during this sweep,
then don't bother reprobing it.
--
1.5.1.4
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general