Craig, good day.

Mon, Jun 23, 2008 at 01:30:52PM +0100, Craig Macdonald wrote:
> I have experienced these pauses before.

15 minutes one where Maui blocked on read()?

> This was resolved by using nscd on the master node.

In my case I clearly see from the strace of pbs_server that it just
receives many descriptors that have something to read from via the
select() call.  But it then fails to contact two cluster nodes,
each one with 5 seconds timeout; and Maui times out 1 second before
its request goes to be handled.  So my problem seems to be unrelated
to the NSCD (and LDAP; I assume you mean that you use LDAP
authentication and NSS).  I had very bad luck with NSCD and LDAP
in the past (with RHEL 3.x), so I am not feeling myself very eager
to test it once again: in the past nscd just got stuck at some point
of its operation, so nodes were almost completely unresponsive to
the external logins.

> However a workaround in the code is probably desirable.

May be my case is not related to yours.  Will you be able to test
the patches?

Thank you!
-- 
Eygene Ryabinkin, Russian Research Centre "Kurchatov Institute"
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to