happy sunday

Yes, but only if ‘pid’ hasn’t been cached before, which I think would
mean that not a single line was logged before stopping the service.

doesn't get cached if no output is read while service is 'running.
fork+exec-command: if the pid-file doesnt show up immediately, there is an 
entire 1 second sleep. The logger can easily read the output while the service 
is still in 'starting
also: if the service doesn't flush stdout, we dont get its output until it 
dies. ('stopping)

Could you explain exactly how that happens (sequence of actions leading
to the deadlock) and share the relevant /var/log/messages excerpt?

./shepherd --socket /tmp/s2/mysocket --config <path>

GNU Shepherd 1.0.3 (Guile 3.0.9, x86_64-unknown-linux-gnu)
Starting service root...
Service root started.
Service root running with value #<<process> id: 18114 command: #f>.
Service root has been started.
starting services...
Configuration successfully loaded from '<path>'.
Starting service myservice...
Service myservice has been started.
Service myservice started.
Successfully started 1 service in the background.
Service myservice running with value #<<process> id: 18132 command: 
("/tmp/a.out")>.

in other terminal:
./herd -s /tmp/s2/mysocket status myservice
<status is 'running and doesnt show any "Recent messages">

./herd -s /tmp/s2/mysocket stop myservice
works fine

more shepherd output:
Stopping service myservice...
Service myservice stopped.
Service myservice is now stopped.

in other terminal, all of these hang:
./herd -s /tmp/s2/mysocket status myservice
./herd -s /tmp/s2/mysocket stop myservice
./herd -s /tmp/s2/mysocket start myservice
./herd -s /tmp/s2/mysocket status
./herd -s /tmp/s2/mysocket stop root

does not hang:
./herd -s /tmp/s2/mysocket status aaaaa
herd: error: service 'aaaaa' could not be found

I have to kill -9 shepherd.

c source code attached for the test program.
I mentioned two possibilities above, and this is scenario #2. stdout not 
flushed. I also had what is probably scenario #1 with a different program.


On 3/30/25 2:44 PM, Ludovic Courtès wrote:
Hi nathan,

nathan <[email protected]> skribis:

I definitely have a deadlock problem with Shepherd and I do believe I've found 
it.
shepherd 1.0.3

Could you explain exactly how that happens (sequence of actions leading
to the deadlock) and share the relevant /var/log/messages excerpt?

This is in service-controller when the service has been stopped:
(when logger
   (put-message logger 'terminate))
But in service-builtin-logger, this is called every time a line is read:
(or pid
     (and service
          (eq? 'running (service-status service))
          (match (service-running-value service)
            ((? process? process)
             (process-id process))
            (value
             value))))

service-status -> service-control-message -> put-message to the service
The fibers documentation says put-message is blocking. Surely this is a 
deadlock.

Yes, but only if ‘pid’ hasn’t been cached before, which I think would
mean that not a single line was logged before stopping the service.

I’ll take a closer look.

Thanks for reporting it and for investigating!

Ludo’.
#include <stdio.h>
#include <signal.h>
#include <unistd.h>

volatile int done=0;
void handle(int a){
	done=1;
}
int main(int argc, char **argv){
	printf("%d\n",(int)getpid());
	struct sigaction a={0};
	a.sa_handler=&handle;
	sigaction(SIGTERM,&a,NULL);
	// change this to i<1000 to fix
	for (int i=0;i<100;i++){
		puts("test0");
	}
	// or add this line to fix
	// fflush(stdout);
	while(!done){
		usleep(10);
	}
	for (int i=0;i<100;i++){
		puts("test1");
	}
}

Reply via email to