Hi Bob, I'll try the second approach, i.e, collecting "mmfsadm dump waiters" locally and then summing the values up, since it can be done without the overhead of ssh.
You mentioned mmlsnode starts all these ssh commands and that made me look into the file itself. I then noticed most of the mm commands are actually scripts. This helps a lot with regards to my original question. mmdsh seems to do what I need. Thanks, Roland > This command is just using ssh to all the nodes and dumping the waiter > information and collecting it. That means if the node is down, slow to > respond, or there are a large number of nodes, it could take a while to > return. In my 400-500 node clusters this command usually take less than 10 > seconds. I do prefix the command with a timeout value in case a node is > hung up and ssh never returns (which it sometimes does, and that’s not the > fault of GPFS) Something like this: > timeout 45s /usr/lpp/mmfs/bin/mmlsnode -N waiters –L > > This means I get incomplete information, but if you don’t you end up piling > up a lot of hung up commands. I would check over your cluster carefully to > see if there are other issues that might cause ssh to hang up – which could > impact other GPFS commands that distribute via ssh. > Another approach would be to dump the waiters locally on each node, send > node specific information to the database, and then sum it up using the > graphing software. > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > > From: > <[email protected]<mailto:gpfsug-discuss-bounces@spe > ctrumscale.org>> on behalf of Roland Pabel > <[email protected]<mailto:[email protected]>> > Organization: RRZK Uni Köln > Reply-To: gpfsug main discussion list > <[email protected]<mailto:[email protected]>> > Date: Friday, April 15, 2016 at 10:50 AM > To: gpfsug main discussion list > <[email protected]<mailto:[email protected]>> > Subject: Re: [gpfsug-discuss] Executing Callbacks on other Nodes > > Hi, > > In our cluster, mmlsnode –N waiters –L takes about 25 seconds to run. So > running it every 30 seconds is a bit close. I'll try running it once a > minute and then incorporating this into our graphing. > > Maybe the command is so slow for me because a few nodes are down? > Is there a parameter to mmlsnode to configure the timeout? > > -- Dr. Roland Pabel Regionales Rechenzentrum der Universität zu Köln (RRZK) Weyertal 121, Raum 3.07 D-50931 Köln Tel.: +49 (221) 470-89589 E-Mail: [email protected] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
