On 2020/11/17 21:18, Bharath Rupireddy wrote:
Thanks Craig!

On Fri, Oct 23, 2020 at 9:37 AM Craig Ringer
<craig.rin...@enterprisedb.com> wrote:

src/test/modules/test_shm_mq/worker.c appears to do the right thing the wrong 
way - it has its own custom handler instead of using die() or 
SignalHandlerForShutdownRequest().


handle_sigterm() and die() are similar except that die() has extra
handling(below) for exiting immediately when waiting for input/command
from the client.
     /*
      * If we're in single user mode, we want to quit immediately - we can't
      * rely on latches as they wouldn't work when stdin/stdout is a file.
      * Rather ugly, but it's unlikely to be worthwhile to invest much more
      * effort just for the benefit of single user mode.
      */
     if (DoingCommandRead && whereToSendOutput != DestRemote)
         ProcessInterrupts();

Having this extra handling is correct for normal backends as they can
connect directly to clients for reading input commands, the above if
condition may become true and ProcessInterrupts() may be called to
handle the SIGTERM immediately.

Note that DoingCommandRead can never be true in bg workers as it's
being set to true only in normal backend PostgresMain(). If we use
die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
a bg worker/non-backend process, there are no chances that the above
part of the code gets hit and the interrupts are processed
immediately. And also here are the bg worker process that use die()
instead of their own custom handlers: parallel workers, autoprewarm
worker, autovacuum worker, logical replication launcher and apply
worker, wal sender process

I think we can also use die() instead of handle_sigterm() in
test_shm_mq/worker.c.

I attached a patch for this change.

Thanks for the patch! It looks good to me.

BTW, I tried to find the past discussion about why die() was not used,
but I could not yet.




In contrast  src/test/modules/worker_spi/worker_spi.c looks plain wrong. 
Especially since it's quoted as an example of how to do things right. It won't 
respond to SIGTERM at all while it's executing a query from its queue, no 
matter how long that query takes or whether it blocks. It can inhibit even 
postmaster shutdown as a result.


Postmaster sends SIGTERM to all children(backends and bgworkers) for
both smart shutdown(wait for children to end their work, then shut
down.) and fast shutdown(rollback active transactions and shutdown
when they are gone.) For immediate shutdown SIGQUIT is sent to
children.(see pmdie()).

For each bg worker(so is for worker_spi.c),
SignalHandlerForCrashExit() is set as SIGQUIT handler in
InitPostmasterChild(), which does nothing but exits immediately. We
can not use quickdie() here because as a bg worker we don't have
to/can not send anything to client. We are good with
SignalHandlerForCrashExit() as with all other bg workers.

Yes, if having worker_spi_sigterm/SignalHandlerForShutdownRequest,
sometimes(as explained above) the worker_spi worker may not respond to
SIGTERM. I think we should be having die() as SIGTERM handler here (as
with normal backend and parallel workers).

Attaching another patch that has replaced custom SIGTERM handler
worker_spi_sigterm() with die() and custom SIGHUP handler
worker_spi_sighup() with standard SignalHandlerForConfigReload().

This patch also looks good to me.




I was going to lob off a quick patch to fix this by making both use quickdie() 
for SIGQUIT and die() for SIGTERM, but after reading for a bit I'm no longer 
sure what the right choice even is. I'd welcome some opinions.


We can not use quickdie() here because as a bg worker we don't have
to/can not send anything to client. We are good with
SignalHandlerForCrashExit() as with all other bg workers.

Thoughts?

I think you're right.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to