On Fri, 5 Feb 2010 07:27:05 -0500
Hal Rosenstock <[email protected]> wrote:

> >
> >
> > Note that 2 does not give much speed up, where 4 does.  Obviously this could
> > have to do with the fact there were 2 nodes which were bad (so if you had
> > 100's of nodes unresponsive a higher value might be worth using)
> 
> It depends on the number of unresponsive nodes being same or higher
> than number of outstanding/parallel SMPs. In a sense, the number of
> outstanding SMPs is a measure of how many unresponsive nodes one is
> willing to tolerate before slowing down/waiting for timeouts. In some
> environments, unresponsive nodes are a normal case.

Agreed but where should we set the default?  I don't think 4 is a bad default.
I don't think it makes the diags overly aggressive, compared with OpenSM.
Sasha I guess this is your call.

Just tell me where to set it and I will make the patch.  Basically with the
user option it can always be changed on a run by run basis.

Ira

> 
> -- Hal
> 
> > but as a
> > default compromise I think 4 is good.
> >
> > Ira
> >
> >> > >
> >> > > Also, I think you are correct that we should increase OpenSM's default 
> >> > > from 4
> >> > > to 8.  For the same reason as above.  Some of our clusters have worked 
> >> > > better
> >> > > with 8 when we are having issues.  But right now we are still running 
> >> > > with 4.
> >> >
> >> > I'm concerned about just increasing ibnetdiscover to 4 rather than 2.
> >> > I've seen a number of clusters with SMP dropping with the current
> >> > lower defaults.
> >>
> >> So OpenSM is seeing dropped packets?  With 4 SMP's on the wire?  I do see 
> >> some
> >> VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an
> >> issue.  What kind of rate are you seeing?
> >>
> >> The other question is; do people regularly run the tools which are using
> >> libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)?  We do.  If others
> >> are not then I would say this change would have less impact as they would 
> >> want
> >> the diags to have some priority for debugging.  The other option is to 
> >> change
> >> the patch to be a default of 2 and allow user to change it depending on 
> >> what
> >> they are trying to do.  If you think that is best I will change the patch.
> >>
> >> Ira
> >>
> >> >
> >> > -- Hal
> >> >
> >> > > Ira
> >> > >
> >> > >>
> >> > >> -- Hal
> >> > >>
> >> > >> >
> >> > >> > The first patch converts the algorithm and the second adds the 
> >> > >> > ibnd_set_max_smps_on_wire call.
> >> > >> >
> >> > >> > Let me know what you think.  Because the algorithm changed so much 
> >> > >> > testing this is a bit difficult because the order of the node 
> >> > >> > discovery is different.  However, I have done some extensive 
> >> > >> > diffing of the output of ibnetdiscover and things look good.
> >> > >> >
> >> > >> > Ira
> >> > >> >
> >> > >> > --
> >> > >> > Ira Weiny
> >> > >> > Math Programmer/Computer Scientist
> >> > >> > Lawrence Livermore National Lab
> >> > >> > 925-423-8008
> >> > >> > [email protected]
> >> > >> > --
> >> > >> > To unsubscribe from this list: send the line "unsubscribe 
> >> > >> > linux-rdma" in
> >> > >> > the body of a message to [email protected]
> >> > >> > More majordomo info at  
> >> > >> > http://***vger.kernel.org/majordomo-info.html
> >> > >> >
> >> > >> --
> >> > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" 
> >> > >> in
> >> > >> the body of a message to [email protected]
> >> > >> More majordomo info at  http://***vger.kernel.org/majordomo-info.html
> >> > >>
> >> > >
> >> > >
> >> > > --
> >> > > Ira Weiny
> >> > > Math Programmer/Computer Scientist
> >> > > Lawrence Livermore National Lab
> >> > > 925-423-8008
> >> > > [email protected]
> >> > >
> >> >
> >>
> >>
> >> --
> >> Ira Weiny
> >> Math Programmer/Computer Scientist
> >> Lawrence Livermore National Lab
> >> 925-423-8008
> >> [email protected]
> >
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > [email protected]
> >
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
[email protected]
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to