Hello, We encountered a rare and hard-to-investigate problem on Windows, which one of our customers reported. Please find the attached patch to fix that. I'll add this to the next CF.
PROBLEM
==============================
PostgreSQL sometimes crashes with the following messages. This is infrequent
(but frequent for the customer); it occurred about 10 times in the past 5
months.
LOG: server process (PID 2712) was terminated by exception 0xC0000005
HINT: See C include file "ntstatus.h" for a description of the hexadecimal
value.
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat
your command.
LOG: all server processes terminated; reinitializing
"server process" shows that an client backend crashed. The above messages
indicate that the process was not running an SQL command.
PostgreSQL runs as a Windows service.
No crash dump was produced anywhere, despite the facts:
- <PGDATA>/crashdumps folder exists and is writable by the PostgreSQL user
account (which is the user postgres.exe runs as)
- The Windows registry configuration allows dumping the crash dump
CAUSE
==============================
We believe WSAStartup() in main.c failed. The only conceivable error is:
WSAEPROCLIM
10067
Too many processes.
A Windows Sockets implementation may have a limit on the number of applications
that can use it simultaneously. WSAStartup may fail with this error if the
limit has been reached.
But I couldn't find what the limit is and whether we can tune it. We couldn't
reproduce the problem.
When I pretend that WSAStartup() failed while a client backend is starting up,
I could see the same phenomenon as the customer. This problem only occurs when
PostgreSQL runs as a Windows service.
The bug is in write_eventlog(). It calls pgwin32_message_to_utf16() which in
turn calls palloc(), which requires the memory management system to be set up
(CurrentMemoryContext != NULL).
FIX
==============================
Add the check "CurrentMemoryContext != NULL" in write_eventlog() as in
write_console().
NOTE
==============================
The reason is for not outputing the crash dump is a) the crash occurred before
installing the Windows exception handler (pgwin32_install_crashdump_handler()
call) and b) the effect of the following call in postmaster is inherited in the
child process.
/* In case of general protection fault, don't show GUI popup
box */
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
But I'm not sure in what order we should do
pgwin32_install_crashdump_handler(), startup_hacks() and steps therein,
MemoryContextInit(). I think that's another patch.
Regards
Takayuki Tsunakawa
write_eventlog_crash.patch
Description: write_eventlog_crash.patch
-- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
