On Wed, Nov 19, 2008 at 01:30:04PM +0000, Gerrit Pape wrote: > ...Having written that, I see your service is down when removing it > > servo:~ 0$ sudo svstat /etc/service/cereal.foo > > /etc/service/cereal.foo: down 51 seconds > > Hmm. If you can reproduce the issue reliably, can you make the service > directory available, so that I can try to reproduce on my systems?
Hey, Gerrit. Thanks so much for the response. I can definitely reproduce the issue reliably, but interestingly, only with cereal sessions. I realize now that the problem is not just when the service is down. Here is an example of a an attempt to remove a running cereal service on one of my servers: rukh:~ 0# ps -eFH | grep [c]ereal.hydra1 root 2349 2337 0 27 24 0 Nov13 ? 00:00:00 runsv cereal.hydra1 1000 6423 2349 0 5929 1624 0 Nov13 ? 00:00:03 /usr/bin/SCREEN -D -m -L -c /etc/cereal/screenrc -s /bin/false -S cereal:hydra1 -t hydra1 /dev/ttyS8 115200 rukh:~ 0# update-service --remove /var/lib/cereal/sessions/hydra1 cereal.hydra1 Service cereal.hydra1 removed, the service daemon received the TERM and CONT signals. rukh:~ 0# ps -eFH | grep [c]ereal.hydra1 root 2349 2337 0 27 28 0 Nov13 ? 00:00:00 runsv cereal.hydra1 rukh:~ 0# kill 2349 rukh:~ 0# ps -eFH | grep [c]ereal.hydra1 root 2349 2337 0 27 28 0 Nov13 ? 00:00:00 runsv cereal.hydra1 rukh:~ 0# kill -9 2349 rukh:~ 0# ps -eFH | grep [c]ereal.hydra1 rukh:~ 1# Note that the SCREEN process (which was exec'd by the service run script) is running at first, the service is removed, the SCREEN process stops, but the runsv does *not* stop until I send it a KILL signal. But here is the really interesting test. First, I can create a very simple dummy service that does stop properly: rukh:/tmp/cdtemp.BKnOUC 0# cat <<EOF >foo/run > #!/bin/bash > while true; do > sleep 1 > done > EOF rukh:/tmp/cdtemp.BKnOUC 0# chmod 755 foo/run rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.foo rukh:/tmp/cdtemp.BKnOUC 1# update-service --add /tmp/cdtemp.BKnOUC/foo test.foo Service test.foo added. rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.foo root 31859 2337 0 27 24 0 11:42 ? 00:00:00 runsv test.foo rukh:/tmp/cdtemp.BKnOUC 0# update-service --remove /tmp/cdtemp.BKnOUC/foo test.foo Service test.foo removed, the service daemon received the TERM and CONT signals. rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.foo rukh:/tmp/cdtemp.BKnOUC 1# Now, I create a new cereal session, copy it's service directory to a temporary location, replace it's run script with the same dummy run script as above, and then try to add and remove it: rukh:/tmp/cdtemp.BKnOUC 0# cereal-admin create hydra1 /dev/ttyS8 115200 gecoadmin adm Created session 'hydra1': --f hydra1 /dev/ttyS8 115200 gecoadmin adm Service cereal.hydra1 added. rukh:/tmp/cdtemp.BKnOUC 0# cp -a /var/lib/cereal/sessions/hydra1 . rukh:/tmp/cdtemp.BKnOUC 0# cp foo/run hydra1/run cp: overwrite `hydra1/run'? y rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1 rukh:/tmp/cdtemp.BKnOUC 1# update-service --add /tmp/cdtemp.BKnOUC/hydra1 test.hydra1 Service test.hydra1 added. rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1 root 32008 2337 0 27 24 0 11:47 ? 00:00:00 runsv test.hydra1 rukh:/tmp/cdtemp.BKnOUC 0# update-service --remove /tmp/cdtemp.BKnOUC/hydra1 test.hydra1 Service test.hydra1 removed, the service daemon received the TERM and CONT signals. rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1 root 32008 2337 0 27 24 0 11:47 ? 00:00:00 runsv test.hydra1 rukh:/tmp/cdtemp.BKnOUC 0# kill 32008 rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1 root 32008 2337 0 27 24 0 11:47 ? 00:00:00 runsv test.hydra1 rukh:/tmp/cdtemp.BKnOUC 0# kill -9 32008 rukh:/tmp/cdtemp.BKnOUC 0# ps -eFH | grep [t]est.hydra1 rukh:/tmp/cdtemp.BKnOUC 1# Notice that the service does *NOT* terminate. I now believe that this problem must have something to do with how the service directory is formatted. Could there be something in the service directory that would prevent the runsv process from accepting the TERM? For instance, there is a 'down' file in the cereal service directory. That shouldn't affect this, but is it possible that it is? Could there be something else that we're doing in cereal that is causing this problem? Since I'm not sure exactly how to pass you one of these cereal service directories, is it possible for you to create a service directory from the cereal package itself? If not, please let me know what the best way for me to pass you a service directory is. Maybe I can just tar it and send it to you via email. Thanks again for you help with this issue. Please let me know what else I should do. jamie.
signature.asc
Description: Digital signature