Hi all,
any news about this issue? Anything else that I can do to help you? Meanwhile there are 4 connections in the same state. (I did not do the whole investigation on all 4, but since they all do not respond on a SIGINT I assume that they all have the same problem.) It may also be interesting that the 4 processes belong to the same two users that already caused this problem earlier. Maybe it really has something to do with their unstable internet connection? (I mentioned this in an earlier mail.) I have also noticed that one of these two users has many open connections which are all idle. I guess that those connections are in fact dead, but the database did not close them for any reason. The pg_stat_activity entries for the corresponding backend processes are as follows:

procpid | usename | current_query | waiting | query_start | backend_start
---------+---------+---------------+---------+-------------------------------+-------------------------------
26033 | dpyrek | <IDLE> | f | 2007-11-08 10:21:01.555232+01 | 2007-11-08 09:55:01.59932+01 18331 | dpyrek | <IDLE> | f | 2007-11-07 11:34:24.968852+01 | 2007-11-07 11:08:29.043762+01 18940 | dpyrek | <IDLE> | f | 2007-11-07 14:29:52.987176+01 | 2007-11-07 13:14:48.609031+01 25868 | dpyrek | <IDLE> | f | 2007-11-08 09:47:46.938991+01 | 2007-11-08 09:13:34.101351+01 6719 | dpyrek | <IDLE> | f | 2007-11-06 12:06:14.875588+01 | 2007-11-06 11:10:00.566644+01 17987 | dpyrek | <IDLE> | f | 2007-11-07 10:31:50.517275+01 | 2007-11-07 10:11:07.310338+01 31808 | dpyrek | <IDLE> | f | 2007-11-08 22:55:03.931727+01 | 2007-11-08 22:55:03.766638+01 25484 | dpyrek | <IDLE> | f | 2007-11-08 08:32:57.265377+01 | 2007-11-08 07:44:30.845967+01 5972 | dpyrek | <IDLE> | f | 2007-11-06 08:51:54.57437+01 | 2007-11-06 08:14:03.560602+01 6241 | dpyrek | <IDLE> | f | 2007-11-06 09:59:02.018452+01 | 2007-11-06 09:20:49.092246+01 6136 | dpyrek | <IDLE> | f | 2007-11-06 09:14:40.729837+01 | 2007-11-06 08:57:29.55187+01 12645 | dpyrek | <IDLE> | f | 2007-11-02 10:08:24.856929+01 | 2007-11-02 09:35:37.640976+01 25254 | dpyrek | <IDLE> | f | 2007-11-08 07:29:04.547081+01 | 2007-11-08 06:33:47.707759+01 20275 | dpyrek | <IDLE> | f | 2007-11-03 09:14:12.73829+01 | 2007-11-03 08:57:05.555972+01 20216 | dpyrek | <IDLE> | f | 2007-11-03 08:46:40.555354+01 | 2007-11-03 08:40:31.756993+01 12435 | dpyrek | <IDLE> | f | 2007-11-02 09:28:53.361365+01 | 2007-11-02 08:48:11.589485+01 19633 | dpyrek | <IDLE> | f | 2007-11-03 08:34:16.263487+01 | 2007-11-03 05:46:16.811069+01 12156 | dpyrek | <IDLE> | f | 2007-11-02 08:10:11.558427+01 | 2007-11-02 07:49:03.442489+01 4899 | dpyrek | <IDLE> | f | 2007-11-01 12:42:30.880391+01 | 2007-11-01 10:56:18.513398+01 11988 | dpyrek | <IDLE> | f | 2007-11-02 07:38:10.315758+01 | 2007-11-02 07:02:52.438251+01 4490 | dpyrek | <IDLE> | f | 2007-11-01 09:51:42.216745+01 | 2007-11-01 09:34:18.63771+01

A ptrace of one of these connections yields the following result:

db2:/home/pgsql/data # strace -p 4899
Process 4899 attached - interrupt to quit
futex(0x994000, FUTEX_WAIT, 2, NULL

This looks identical (even with the same uaddr parameter) for the other processes.

In the log file I find many lines like this:

<2007-11-08 22:56:19 CET - dpyrek> LOG: could not receive data from client: Die Wartezeit für die Verbindung ist abgelaufen
<2007-11-08 22:56:19 CET - dpyrek> LOG:  unexpected EOF on client connection

I'm not sure that these lines correspond to the dead connections, but at least it's the same user.

Does this additional information help you in any way? I'm a little bit afraid that eventually the maximum number of connections will be exceeded. I don't want to "kill -9" the processes because the last time I did this the database was in recovery mode for a substantial amount of time.

Any help is still highly appreciated!

Regards,
   Christian

P.S.: If nothing else helps I could also grant one of you guys root access to the database machine.

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to