On Wed, Nov 19, 2008 at 01:30:04PM +0000, Gerrit Pape wrote:
> ...Having written that, I see your service is down when removing it
> > servo:~ 0$ sudo svstat /etc/service/cereal.foo
> > /etc/service/cereal.foo: down 51 seconds
> 
> Hmm.  If you can reproduce the issue reliably, can you make the service
> directory available, so that I can try to reproduce on my systems?

Hey, Gerrit.  Thanks so much for the response.

I can definitely reproduce the issue reliably, but interestingly, only
with cereal sessions.

I realize now that the problem is not just when the service is down.
Here is an example of a an attempt to remove a running cereal service
on one of my servers:

rukh:~ 0# ps -eFH | grep [c]ereal.hydra1
root      2349  2337  0    27    24   0 Nov13 ?        00:00:00     runsv 
cereal.hydra1
1000      6423  2349  0  5929  1624   0 Nov13 ?        00:00:03       
/usr/bin/SCREEN -D -m -L -c /etc/cereal/screenrc -s /bin/false -S cereal:hydra1 
-t hydra1 /dev/ttyS8 115200
rukh:~ 0# update-service --remove /var/lib/cereal/sessions/hydra1 cereal.hydra1
Service cereal.hydra1 removed, the service daemon received the TERM and CONT 
signals.
rukh:~ 0# ps -eFH | grep [c]ereal.hydra1
root      2349  2337  0    27    28   0 Nov13 ?        00:00:00     runsv 
cereal.hydra1
rukh:~ 0# kill 2349
rukh:~ 0# ps -eFH | grep [c]ereal.hydra1
root      2349  2337  0    27    28   0 Nov13 ?        00:00:00     runsv 
cereal.hydra1
rukh:~ 0# kill -9 2349
rukh:~ 0# ps -eFH | grep [c]ereal.hydra1
rukh:~ 1# 

Note that the SCREEN process (which was exec'd by the service run
script) is running at first, the service is removed, the SCREEN
process stops, but the runsv does *not* stop until I send it a KILL
signal.

But here is the really interesting test.  First, I can create a very
simple dummy service that does stop properly:

rukh:/tmp/cdtemp.BKnOUC 0# cat <<EOF >foo/run
> #!/bin/bash
> while true; do
> sleep 1
> done
> EOF
rukh:/tmp/cdtemp.BKnOUC 0# chmod 755 foo/run 
rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.foo
rukh:/tmp/cdtemp.BKnOUC 1# update-service --add /tmp/cdtemp.BKnOUC/foo test.foo
Service test.foo added.
rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.foo
root     31859  2337  0    27    24   0 11:42 ?        00:00:00     runsv 
test.foo
rukh:/tmp/cdtemp.BKnOUC 0# update-service --remove /tmp/cdtemp.BKnOUC/foo 
test.foo
Service test.foo removed, the service daemon received the TERM and CONT signals.
rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.foo
rukh:/tmp/cdtemp.BKnOUC 1# 

Now, I create a new cereal session, copy it's service directory to a
temporary location, replace it's run script with the same dummy run
script as above, and then try to add and remove it:

rukh:/tmp/cdtemp.BKnOUC 0# cereal-admin create hydra1 /dev/ttyS8 115200 
gecoadmin adm
Created session 'hydra1':
--f hydra1 /dev/ttyS8 115200 gecoadmin adm
Service cereal.hydra1 added.
rukh:/tmp/cdtemp.BKnOUC 0# cp -a /var/lib/cereal/sessions/hydra1 .
rukh:/tmp/cdtemp.BKnOUC 0# cp foo/run hydra1/run
cp: overwrite `hydra1/run'? y
rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1
rukh:/tmp/cdtemp.BKnOUC 1# update-service --add /tmp/cdtemp.BKnOUC/hydra1 
test.hydra1
Service test.hydra1 added.
rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1
root     32008  2337  0    27    24   0 11:47 ?        00:00:00     runsv 
test.hydra1
rukh:/tmp/cdtemp.BKnOUC 0# update-service --remove /tmp/cdtemp.BKnOUC/hydra1 
test.hydra1
Service test.hydra1 removed, the service daemon received the TERM and CONT 
signals.
rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1
root     32008  2337  0    27    24   0 11:47 ?        00:00:00     runsv 
test.hydra1
rukh:/tmp/cdtemp.BKnOUC 0# kill 32008
rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1
root     32008  2337  0    27    24   0 11:47 ?        00:00:00     runsv 
test.hydra1
rukh:/tmp/cdtemp.BKnOUC 0# kill -9 32008
rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1
rukh:/tmp/cdtemp.BKnOUC 1# 

Notice that the service does *NOT* terminate.

I now believe that this problem must have something to do with how the
service directory is formatted.  Could there be something in the
service directory that would prevent the runsv process from accepting
the TERM?  For instance, there is a 'down' file in the cereal service
directory.  That shouldn't affect this, but is it possible that it is?
Could there be something else that we're doing in cereal that is
causing this problem?

Since I'm not sure exactly how to pass you one of these cereal service
directories, is it possible for you to create a service directory from
the cereal package itself?  If not, please let me know what the best
way for me to pass you a service directory is.  Maybe I can just tar
it and send it to you via email.

Thanks again for you help with this issue.  Please let me know what
else I should do.

jamie.

Attachment: signature.asc
Description: Digital signature

Reply via email to