Hi Alex,

Thanks.  Are you still reviewing the remote_guid_sorting patch (the 2/4
patch)?  Or do you feel there is work there that needs to be done?

Al

On Mon, 2011-07-04 at 03:52 -0700, Alex Netes wrote:
> Hi Al, Hared,
> 
> Applied:
>   [PATCH 1/4] Support port shifting.
>   [PATCH 3/4] Support scatter ports.
>   [PATCH 4/4] Cleanup scatter ports patch. 
> 
> Thanks.
> 
> On 17:56 Wed 06 Apr     , Albert Chu wrote:
> > Hey Alex, Jared,
> > 
> > On Wed, 2011-04-06 at 11:14 -0700, Albert Chu wrote:
> > > Hey Alex,
> > > 
> > > On Wed, 2011-04-06 at 07:09 -0700, Alex Netes wrote:
> > > > Hi Al, Jared,
> > > > 
> > > > On 14:31 Wed 23 Mar     , Albert Chu wrote:
> > > > > > 
> > > > > > 1) Port Shifting
> > > > > > 
> > > > > > This is similar to what was done with some of the LMC > 0 code.
> > > > > > Congestion would occur due to "alignment" of routes w/ common 
> > > > > > traffic
> > > > > > patterns.  However, we found that it was also necessary for LMC=0 
> > > > > > and
> > > > > > only for used-ports.  For example, lets say there are 4 ports 
> > > > > > (called A,
> > > > > > B, C, D) and we are routing lids 1-9 through them.  Suppose only 
> > > > > > routing
> > > > > > through A, B, and C will reach lids 1-9.
> > > > > > 
> > > > > > The LFT would normally be:
> > > > > > 
> > > > > > A: 1 4 7
> > > > > > B: 2 5 8
> > > > > > C: 3 6 9
> > > > > > D:
> > > > > > 
> > > > > > The Port Shifting option would make this:
> > > > > > 
> > > > > > A: 1 6 8
> > > > > > B: 2 4 9
> > > > > > C: 3 5 7
> > > > > > D:
> > > > > > 
> > > > > > This option by itself improved the mpiGraph average send/recv 
> > > > > > bandwidth
> > > > > > from 420 MB/s and 508 MB/s to to 991 MB/s and 1172 MB/s.
> > > > > > 
> > > > 
> > > > After thinking about this a little more and reviewing Jared Carr's - 
> > > > Scatter ports
> > > > patch, I think we should combine these efforts into one framework as Al
> > > > suggested.
> > 
> > As I was beginning to integrate Jared's patch with mine, it ends up that
> > algorithmically/architecturally, it isn't as easy (or similar) as I had
> > originally thought.  In particular, it has issues with LMC > 0.
> > Normally you want to route through a port that is least forwarded
> > through or goes through systems it hasn't seen yet.  This sort of
> > conflicts with the idea of selecting a port randomly.
> > 
> > I'm going to throw out the following patch series as a starting point
> > for discussion on scatter ports.  My original two patches have been
> > updated with new log messages and some minor tweaks.
> > 
> > My attempt of integration of Jared's scatter patch is included.  It has
> > a variety of cleanup (b/c of conflicts w/ my patches), 1 or 2 gotchas I
> > caught, and various tweaks for code consistency with my patches/other
> > OpenSM code.  Jared's original code algorithm is largely unchanged, but
> > I did modify it to deal with LMC > 0 better (by basically ignoring LMC).
> > 
> > Jared, LMK what you think and if it'll work for you.
> > 
> > Al
> > 
> > P.S.  Jared, I made you author on the 3rd patch naturally.
> > 
> > > Moreover, isn't "port_shifting" too much fabric oriented? Do
> > > > general OpenSM users will find this useful for them?
> > > > Moreover, how can user identify that port_shifting may improve 
> > > > performance for
> > > > him.
> > > 
> > > I will admit, I'm unsure of how much non-HPC users would benefit from
> > > this option, be hurt by it, or if they would even care.  I can't speak
> > > for all users, but here at LLNL and at most of the lab HPC sites, people
> > > play with the options and experiment to find the best routing algorithm
> > > + settings that support their environment.  I would imagine the
> > > port_shifting option would just be another option for people to
> > > experiment with.
> > > 
> > > I think adding Jared's Scatter Ports would be easy to merge into my line
> > > of patches.  Let me see if I can integrate his patch into my line
> > > easily.
> > > 
> > > > Is providing shift factor (more than the suggested 1) will help to make 
> > > > it
> > > > suitable foo a general case?
> > > 
> > > That seems like a good idea, we certainly could support an arbitrary
> > > shift, allowing users to experiment if there is a better one for their
> > > particular environment.
> > > 
> > > > > > 2) Remote Guid Sorting
> > > > > > 
> > > > > > Most core/spine switches we've seen thus far have had line boards
> > > > > > connected to spine boards in a consistent pattern.  However, we 
> > > > > > recently
> > > > > > got some Qlogic switches that connect from line/leaf boards to spine
> > > > > > boards in a (to the casual observer) random pattern.  I'm sure 
> > > > > > there was
> > > > > > a good electrical/board reason for this design, but it does hurt 
> > > > > > routing
> > > > > > b/c updn doesn't account for this.  Here's an output from 
> > > > > > iblinkinfo as
> > > > > > an example.
> > > > > > 
> > > > 
> > > > Why this problem can't be addressed by guid_routing_order_file option?
> > > 
> > > The problem we encountered in our fabric is predominantly a
> > > switch-to-switch routing issue with a spine switch.  The
> > > guid_routing_order_file wouldn't be able to solve this, since its input
> > > is just end ports.
> > > 
> > > Or another way to say it, this option directly affects the routing
> > > decisions made.  The guid_routing_order_file does not, it only affects
> > > the order in which routes are chosen (which can have consequences, but
> > > the routing algorithm itself is unchanged).
> > > 
> > > Al
> > > 
> > > > 
> > > > --Alex
> > -- 
> > Albert Chu
> > [email protected]
> > Computer Scientist
> > High Performance Systems Division
> > Lawrence Livermore National Laboratory
> 
> 
-- 
Albert Chu
[email protected]
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to