mmnetverify is a new tool that aims to make it easier to identify network problems.
Regarding bandwidth commands, in 4.2.2, there are two options: mmnetverify bandwidth-node - [1 to 1] this will communicate from local node (or one or more nodes specified with -N option) to one or more target nodes. The bandwidth tests are executed serially from nodes in node list to target, iterating through each target node, one by one. The serially calculated bandwidth with each node is reported. mmnetverify bandwidth-cluster - [1 to many] this is a measure of parallel communication from the local node (or one or more nodes specified with -N option) to all of the other nodes in the cluster. The concurrent bandwidth with each target node in the cluster is reported. In both of these tests, we establish a socket connection, and pass a fixed number of bytes over the connection and calculate bandwidth based on how long that transmission took. For 4.2.3, there is a new bandwidth test called gnr-bandwidth. It is similar to the bandwidth-cluster [1 to many] except that it uses the following steps: 1. establish connection from node to all other target nodes in the cluster 2. start sending data to target for some ramp up period 3. after ramp up period, continue sending data for test period 4. calculate bandwidth based on bytes transmitted during test period The bandwidth to each node is summed to return a total bandwidth from the command node to the other nodes in the cluster. In future releases, we may modify bandwidth-node & bandwidth-cluster tests to use the gnr-bandwidth methodology (and deprecate gnr-bandwidth). Your feedback on how to improve mmnetverify is appreciated. Regarding: > We found some weird looking numbers that i don't quite understand and not in the places we might expect. > For example between hosts on the same switch, traffic flowing to another switch and traffic flowing to > nodes in another data centre where it's several switch hops. Some nodes over there were significantly > faster than switch local nodes. Note that system load can impact the test results. Is it possible that the slow nodes on the local switch were heavily loaded? Or is it possible they are using an interface that is lower bandwidth? (sorry, i had to ask that one to be sure...) Regards, Bill Owen [email protected] Spectrum Scale Development 520-799-4829 From: "Simon Thompson (Research Computing - IT Services)" <[email protected]> To: gpfsug main discussion list <[email protected]> Date: 03/17/2017 01:13 PM Subject: Re: [gpfsug-discuss] mmnetverify Sent by: [email protected] It looks to run sequential tests to each node one at a time and isn't using NSD protocol but echo server. We found some weird looking numbers that i don't quite understand and not in the places we might expect. For example between hosts on the same switch, traffic flowing to another switch and traffic flowing to nodes in another data centre where it's several switch hops. Some nodes over there were significantly faster than switch local nodes. I think it was only added in 4.2.2 and is listed as "not yet a replacement for nsdperf". I get that is different as it's using NSD protocol, but was struggling a bit with what mmnetverify might be doing. Simon From: [email protected] [[email protected]] on behalf of Sanchez, Paul [[email protected]] Sent: 17 March 2017 19:43 To: [email protected] Subject: Re: [gpfsug-discuss] mmnetverify Sven will tell you: "RPC isn't streaming" and that may account for the discrepancy. If the tests are doing any "fan-in" where multiple nodes are sending to single node, then it's also possible that you are exhausting switch buffer memory in a way that a 1:1 iperf wouldn't. For our internal benchmarking we've used /usr/lpp/mmfs/samples/net/nsdperf to more closely estimate the real performance. I haven't played with mmnetverify yet though. -Paul -----Original Message----- From: [email protected] [ mailto:[email protected]] On Behalf Of Simon Thompson (Research Computing - IT Services) Sent: Friday, March 17, 2017 2:50 PM To: [email protected] Subject: [gpfsug-discuss] mmnetverify Hi all, Just wondering if anyone has used the mmnetverify tool at all? Having made some changes to our internal L3 routing this week, I was interested to see what it claimed. As a side-note, it picked up some DNS resolution issues, though I'm not clear on some of those why it was claiming this as doing a "dig" on the node, it resolved fine (but adding the NSD servers to the hosts files cleared the error). Its actually the bandwidth tests that I'm interested in hearing other people's experience with as the numbers that some out from it are very different (lower) than if we use iperf to test performance between two nodes. Anyone any thoughts at all on this? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
