Hi Bob, thanks for your remarks. I already understood that deadlocks are more timeouts than "tangled up balls of code". I was not (yet) planning on changing the whole routine, I'd just like to get a notice when something unexpected happens in the cluster. So, first, I just want to write these notices into a file and email it once it reaches a certain size.
From what you are saying, it sounds like it is worth upgrading to 4.1.1.x . We are planning a maintenance next month, I'll try to get this into the todo- list. Upgrading beyond this is going require a longer preparation, unless the prerequisite of "RHEL 6.4 or later" as stated on the IBM FAQ is irrelevant. Our clients still run RHEL 6.3. Best regards, Roland > Some general thoughts on “deadlocks” and automated deadlock detection. > > I personally don’t like the term “deadlock” as it implies a condition that > won’t ever resolve itself. In GPFS terms, a deadlock is really a “long RPC > waiter” over a certain threshold. RPCs that wait on certain events can and > do occur and they can take some time to complete. This is not necessarily a > condition that is a problem, but you should be looking into them. > GPFS does have automated deadlock detection and collection, but in the early > releases it was … well.. it’s not very “robust”. With later releases (4.2) > it’s MUCH better. I personally don’t rely on it because in larger clusters > it can be too aggressive and depending on what’s really going on it can > make things worse. This statement is my opinion and it doesn’t mean it’s > not a good thing to have. :-) > On the point of what commands to execute and what to collect – be careful > about long running callback scripts and executing commands on other nodes. > Depending on what the issues is, you could end up causing a deadlock or > making it worse. Some basic data collection, local to the node with the > long RPC waiter is a good thing. Test them well before deploying them. And > make sure that you don’t conflict with the automated collections. (which > you might consider turning off) > For my larger clusters, I dump the cluster waiters on a regular basis (once > a minute: mmlsnode –N waiters –L), count the types and dump them into a > database for graphing via Grafana. This doesn’t help me with true deadlock > alerting, but it does give me insight into overall cluster behavior. If I > see large numbers of long waiters I will (usually) go and investigate them > on a cases by case basis. If you have large numbers of long RPC waiters on > an ongoing basis, it's an indication of a larger problem that should be > investigated. A few here and there is not a cause for real alarm in my > experience. > Last – if you have a chance to upgrade to 4.1.1 or 4.2, I would encourage > you to do so as the deadlock detection has improved quite a bit. > Bob Oesterlin > Sr Storage Engineer, Nuance HPC Grid > [email protected] > > From: > <[email protected]<mailto:gpfsug-discuss-bounces@spe > ctrumscale.org>> on behalf of Roland Pabel > <[email protected]<mailto:[email protected]>> > Organization: RRZK Uni Köln > Reply-To: gpfsug main discussion list > <[email protected]<mailto:[email protected]>> > Date: Tuesday, April 12, 2016 at 3:03 AM > To: gpfsug main discussion list > <[email protected]<mailto:[email protected]>> > Subject: [gpfsug-discuss] Executing Callbacks on other Nodes > > Hi everyone, > > we are using GPFS 4.1.0.8 with 4 servers and 850 clients. Our GPFS setup is > fairly new, we are still in the testing phase. A few days ago, we had some > problems in the cluster which seemed to have started with deadlocks on a > small number of nodes. To be better prepared for this scenario, I would > like to install a callback for Event deadlockDetected. But this is a local > event and the callback is executed on the client nodes, from which I cannot > even send an email. > > Is it possible using mm-commands to instead delegate the callback to the > servers (Nodeclass nsdNodes)? > > I guess it would be possible to use a callback of the form "ssh nsd0 > /root/bin/deadlock-callback.sh", but then it is contingent upon server nsd0 > being available. The mm-command style "-N nsdNodes" would more reliable in > my opinion, because it would be run on all servers. On the servers, I can > then check to actually only execute the script on the cluster manager. > Thanks > > Roland > -- > Dr. Roland Pabel > Regionales Rechenzentrum der Universität zu Köln (RRZK) > Weyertal 121, Raum 3.07 > D-50931 Köln > > Tel.: +49 (221) 470-89589 > E-Mail: [email protected]<mailto:[email protected]> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listi > nfo_gpfsug-2Ddiscuss&d=CwIFAw&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY& > r=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU&m=c7jzNm-H6SdZMztP1xkwgySivoe4 > FlOcI2pS2SCJ8K8&s=AfohxS7tz0ky5C8ImoufbQmQpdwpo4wEO7cSCzHPCD0&e= -- Dr. Roland Pabel Regionales Rechenzentrum der Universität zu Köln (RRZK) Weyertal 121, Raum 3.07 D-50931 Köln Tel.: +49 (221) 470-89589 E-Mail: [email protected] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
