Hi,
Le 03/11/2015 19:09, Sylvain Beucler - Inria a écrit :
After debugging stalled processes from our testsuite and prod, I
highly suspect that the timeouts come from nss/nscd (see attached
backtrace w/ debugging symbols):
- GDB shows they are stuck in a libnss-pgsql2 deadlock, as described in:
http://lists.fusionforge.org/pipermail/fusionforge-general/2014-March/002631.html
However since nscd is running, the process shouldn't even enter
libnss-pgsql, so timeouts happen during a random nscd failure.
- GDB shows libpq checks the requestor UID *to locate the .pgpass
file* (not to authenticate the username, since our nss-pgsql.conf
specifies it explicitly). Fortunately this can be bypassed like:
# service unscd stop
# su admin -c id
<stalls...>
# PGPASSFILE= su admin -c id
uid=20102(admin) gid=100(users)
groupes=100(users),10006(tmpl),10007(projecta),1.
So short of debugging unscd, and short of modifying libpq so it stops
using getpw* when used from nss, we can set PGPASSFILE in various
daemons (apache scm config at least, possibly ssh/shell too).
What do you think?
Updates:
- If the server is remote, getpw* is also called to look for SSL-related
files in the user directory (???).
Additional workaround: set sslmode=disable in nss-pgsql.conf
- To apply PGPASSFILE='' in apache, SetEnv/SetEnvIf are inoperant.
On Debian, using /etc/apache2/envvars works (i.e. no need for unscd).
I'm considering adding PGPASSFILE in the testsuite's apaches and see if
that helps.
Cheers!
Sylvain
_______________________________________________
Fusionforge-general mailing list
Fusionforge-general@lists.fusionforge.org
http://lists.fusionforge.org/cgi-bin/mailman/listinfo/fusionforge-general