Hi,

Le 03/11/2015 19:09, Sylvain Beucler - Inria a écrit :
After debugging stalled processes from our testsuite and prod, I highly suspect that the timeouts come from nss/nscd (see attached backtrace w/ debugging symbols):

- GDB shows they are stuck in a libnss-pgsql2 deadlock, as described in:
http://lists.fusionforge.org/pipermail/fusionforge-general/2014-March/002631.html However since nscd is running, the process shouldn't even enter libnss-pgsql, so timeouts happen during a random nscd failure.

- GDB shows libpq checks the requestor UID *to locate the .pgpass file* (not to authenticate the username, since our nss-pgsql.conf specifies it explicitly). Fortunately this can be bypassed like:
# service unscd stop
# su admin -c id
<stalls...>
# PGPASSFILE= su admin -c id
uid=20102(admin) gid=100(users) groupes=100(users),10006(tmpl),10007(projecta),1.


So short of debugging unscd, and short of modifying libpq so it stops using getpw* when used from nss, we can set PGPASSFILE in various daemons (apache scm config at least, possibly ssh/shell too).

What do you think?

Updates:

- If the server is remote, getpw* is also called to look for SSL-related files in the user directory (???).
  Additional workaround: set sslmode=disable in nss-pgsql.conf

- To apply PGPASSFILE='' in apache, SetEnv/SetEnvIf are inoperant.
  On Debian, using /etc/apache2/envvars works (i.e. no need for unscd).

I'm considering adding PGPASSFILE in the testsuite's apaches and see if that helps.

Cheers!
Sylvain

_______________________________________________
Fusionforge-general mailing list
Fusionforge-general@lists.fusionforge.org
http://lists.fusionforge.org/cgi-bin/mailman/listinfo/fusionforge-general

Reply via email to