On Fri, 5 Feb 2010 07:27:05 -0500 Hal Rosenstock <[email protected]> wrote:
> > > > > > Note that 2 does not give much speed up, where 4 does. Obviously this could > > have to do with the fact there were 2 nodes which were bad (so if you had > > 100's of nodes unresponsive a higher value might be worth using) > > It depends on the number of unresponsive nodes being same or higher > than number of outstanding/parallel SMPs. In a sense, the number of > outstanding SMPs is a measure of how many unresponsive nodes one is > willing to tolerate before slowing down/waiting for timeouts. In some > environments, unresponsive nodes are a normal case. Agreed but where should we set the default? I don't think 4 is a bad default. I don't think it makes the diags overly aggressive, compared with OpenSM. Sasha I guess this is your call. Just tell me where to set it and I will make the patch. Basically with the user option it can always be changed on a run by run basis. Ira > > -- Hal > > > but as a > > default compromise I think 4 is good. > > > > Ira > > > >> > > > >> > > Also, I think you are correct that we should increase OpenSM's default > >> > > from 4 > >> > > to 8. For the same reason as above. Some of our clusters have worked > >> > > better > >> > > with 8 when we are having issues. But right now we are still running > >> > > with 4. > >> > > >> > I'm concerned about just increasing ibnetdiscover to 4 rather than 2. > >> > I've seen a number of clusters with SMP dropping with the current > >> > lower defaults. > >> > >> So OpenSM is seeing dropped packets? With 4 SMP's on the wire? I do see > >> some > >> VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an > >> issue. What kind of rate are you seeing? > >> > >> The other question is; do people regularly run the tools which are using > >> libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)? We do. If others > >> are not then I would say this change would have less impact as they would > >> want > >> the diags to have some priority for debugging. The other option is to > >> change > >> the patch to be a default of 2 and allow user to change it depending on > >> what > >> they are trying to do. If you think that is best I will change the patch. > >> > >> Ira > >> > >> > > >> > -- Hal > >> > > >> > > Ira > >> > > > >> > >> > >> > >> -- Hal > >> > >> > >> > >> > > >> > >> > The first patch converts the algorithm and the second adds the > >> > >> > ibnd_set_max_smps_on_wire call. > >> > >> > > >> > >> > Let me know what you think. Because the algorithm changed so much > >> > >> > testing this is a bit difficult because the order of the node > >> > >> > discovery is different. However, I have done some extensive > >> > >> > diffing of the output of ibnetdiscover and things look good. > >> > >> > > >> > >> > Ira > >> > >> > > >> > >> > -- > >> > >> > Ira Weiny > >> > >> > Math Programmer/Computer Scientist > >> > >> > Lawrence Livermore National Lab > >> > >> > 925-423-8008 > >> > >> > [email protected] > >> > >> > -- > >> > >> > To unsubscribe from this list: send the line "unsubscribe > >> > >> > linux-rdma" in > >> > >> > the body of a message to [email protected] > >> > >> > More majordomo info at > >> > >> > http://***vger.kernel.org/majordomo-info.html > >> > >> > > >> > >> -- > >> > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" > >> > >> in > >> > >> the body of a message to [email protected] > >> > >> More majordomo info at http://***vger.kernel.org/majordomo-info.html > >> > >> > >> > > > >> > > > >> > > -- > >> > > Ira Weiny > >> > > Math Programmer/Computer Scientist > >> > > Lawrence Livermore National Lab > >> > > 925-423-8008 > >> > > [email protected] > >> > > > >> > > >> > >> > >> -- > >> Ira Weiny > >> Math Programmer/Computer Scientist > >> Lawrence Livermore National Lab > >> 925-423-8008 > >> [email protected] > > > > > > -- > > Ira Weiny > > Math Programmer/Computer Scientist > > Lawrence Livermore National Lab > > 925-423-8008 > > [email protected] > > > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 [email protected] -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
