Craig, good day. Mon, Jun 23, 2008 at 01:30:52PM +0100, Craig Macdonald wrote: > I have experienced these pauses before.
15 minutes one where Maui blocked on read()? > This was resolved by using nscd on the master node. In my case I clearly see from the strace of pbs_server that it just receives many descriptors that have something to read from via the select() call. But it then fails to contact two cluster nodes, each one with 5 seconds timeout; and Maui times out 1 second before its request goes to be handled. So my problem seems to be unrelated to the NSCD (and LDAP; I assume you mean that you use LDAP authentication and NSS). I had very bad luck with NSCD and LDAP in the past (with RHEL 3.x), so I am not feeling myself very eager to test it once again: in the past nscd just got stuck at some point of its operation, so nodes were almost completely unresponsive to the external logins. > However a workaround in the code is probably desirable. May be my case is not related to yours. Will you be able to test the patches? Thank you! -- Eygene Ryabinkin, Russian Research Centre "Kurchatov Institute" _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
