Hi Hal,

I have some testing info about the second patch below.

On 07/03/2013 03:23 AM, Hal Rosenstock wrote:
HI Jeff,

On 6/26/2013 5:24 PM, Jeff Becker wrote:
Hi Hal. At the OFA workshop, I mentioned that I've been working on some
modifications to opensm that we use at NASA. Following extensive testing
of these applied to opensm 3.3.13 (the version we run here), I have
ported these to top of tree opensm, and have tested them on a small
cluster.
Thanks for getting this done! For future reference, patches should be
sent as plain text as this makes it easier to comment.

OK. So I just send the output of git-format-patch directly? It appears to be formatted properly.

The first patch modifies the console logflush command to take "on" or
"off" as an argument for toggling.
Thanks. Applied.

The second (more extensive) patch
adds a command line option to specify a file in which each line contains
a switch GUID/port pair to be ignored by opensm. The idea is to specify
this file when you start opensm (it can be empty), and add ports to
ignore (one per line for each end of a connection) to the file. At the
next heavy sweep (or HUP) the sm will reprogram the forwarding tables
without including the ignored links. We use this for replacing cables,
as well as for system expansion (adding new racks).
I'll comment on this one later.

Dale (cc'd) did some testing with my patch on Pleiades in preparation for a system augmentation (new racks) happening soon. He found that the SM correctly produces routes that do not use links marked to be ignored, but when you then remove or disable the links, the SM re-routes the fabric anyway and comes up with different routes than before. This rerouting causes problems with existing connections. There also appears to be a bookkeeping problem such that some of these links get added to the SM's "light sampling" list and never get removed. This ties up outstanding MAD packet slots, causing the SM to become unresponsive for several seconds every time it reviews its light sampling list.

I'm working on fixing these. I'll take care of the second problem (incorrectly getting added to the light sampling list) first. Is it possible this problem is related to the re-routing on port disable problem? Anyhow, if you have any specific comments about these issues, that would be great. Thanks, and have a great Fourth of July.

-jeff

-- Hal

Please let me know if you have any questions/issues with these. Thanks.

-jeff

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to