Re: [systemd-devel] Newbie question - Requires doesn't work properly
Am 22.11.2013 03:04, schrieb salil GK: Thanks a lot David On 22 November 2013 06:44, David Timothy Strauss da...@davidstrauss.net mailto:da...@davidstrauss.net wrote: On Thu, Nov 21, 2013 at 4:57 PM, salil GK gksa...@gmail.com mailto:gksa...@gmail.com wrote: What happens is - my process may be busy with some other activity during which time it will fail to send periodic message to systemd. After a while it will come out of it's loop and ready to serve. But during this time system would have already marked the process as failed. Then you need to either use another thread, refactor to make a tighter event loop, or increase the watchdog time. Drifting in and out of tolerance with watchdog is not a safe strategy. the problem i see with use another thread is that this thread can happily work and send it's keep alive, but that does not mean at the end that the service itself is working OK and responsible because both are running isolated in case of network services it would be pretty cool if systemd watchdog could be configured to connect to the service avery n seconds and if there is no response restart it because this would monitor the real service without need external tools signature.asc Description: OpenPGP digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Newbie question - Requires doesn't work properly
It is the responsibility of whatever sends the watchdog to ensure everything's healthy, however necessary. It would be silly to spawn a thread and have it blindly report health to watchdog. The point is for that thread to do proper checks but ensure reports go in at the right intervals. On Nov 22, 2013 7:50 PM, Reindl Harald h.rei...@thelounge.net wrote: Am 22.11.2013 03:04, schrieb salil GK: Thanks a lot David On 22 November 2013 06:44, David Timothy Strauss da...@davidstrauss.netmailto: da...@davidstrauss.net wrote: On Thu, Nov 21, 2013 at 4:57 PM, salil GK gksa...@gmail.commailto: gksa...@gmail.com wrote: What happens is - my process may be busy with some other activity during which time it will fail to send periodic message to systemd. After a while it will come out of it's loop and ready to serve. But during this time system would have already marked the process as failed. Then you need to either use another thread, refactor to make a tighter event loop, or increase the watchdog time. Drifting in and out of tolerance with watchdog is not a safe strategy. the problem i see with use another thread is that this thread can happily work and send it's keep alive, but that does not mean at the end that the service itself is working OK and responsible because both are running isolated in case of network services it would be pretty cool if systemd watchdog could be configured to connect to the service avery n seconds and if there is no response restart it because this would monitor the real service without need external tools ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Newbie question - Requires doesn't work properly
Am 22.11.2013 12:49, schrieb David Timothy Strauss: It is the responsibility of whatever sends the watchdog to ensure everything's healthy, however necessary. It would be silly to spawn a thread and have it blindly report health to watchdog. The point is for that thread to do proper checks but ensure reports go in at the right intervals. i know that but the *how* is the question you can internally check what not but that does not mean at the end of the day the service responds correctly to a client connection over the network until you do not go through the same stack meaning doing a network connection i spent hundrets of hours in upstream-debugging of dbmail to find spinlocks and what not else only happening in rare situations, one of them took 16 hours stress tests until it happend with debug-log enabled while on the real server it took a few minutes to get triggered by a random client action that's the difference between theory and real workload your internal checks are mostly theory because in case of a bug you have undefined behavior and what you want to achieve with the watchdog is catch this undefined behavior and restart the service - in doubt this will not work in the rare cases the watchdog should restart until you went the complete code-path of a client, in case of a IMAP server you can enter the spin-loop everywhere from accept the connection to folder listing or receive a message and it may depend on a buffer overflow while high concurrency and different threads are touching each other in a unexpected way been there, died nearly in debug it and catch data for upstream On Nov 22, 2013 7:50 PM, Reindl Harald h.rei...@thelounge.net mailto:h.rei...@thelounge.net wrote: Am 22.11.2013 03:04, schrieb salil GK: Thanks a lot David On 22 November 2013 06:44, David Timothy Strauss da...@davidstrauss.net mailto:da...@davidstrauss.net mailto:da...@davidstrauss.net mailto:da...@davidstrauss.net wrote: On Thu, Nov 21, 2013 at 4:57 PM, salil GK gksa...@gmail.com mailto:gksa...@gmail.com mailto:gksa...@gmail.com mailto:gksa...@gmail.com wrote: What happens is - my process may be busy with some other activity during which time it will fail to send periodic message to systemd. After a while it will come out of it's loop and ready to serve. But during this time system would have already marked the process as failed. Then you need to either use another thread, refactor to make a tighter event loop, or increase the watchdog time. Drifting in and out of tolerance with watchdog is not a safe strategy. the problem i see with use another thread is that this thread can happily work and send it's keep alive, but that does not mean at the end that the service itself is working OK and responsible because both are running isolated in case of network services it would be pretty cool if systemd watchdog could be configured to connect to the service avery n seconds and if there is no response restart it because this would monitor the real service without need external tools signature.asc Description: OpenPGP digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Newbie question - Requires doesn't work properly
On Fri, Nov 22, 2013 at 10:24 PM, Reindl Harald h.rei...@thelounge.net wrote: your internal checks are mostly theory because in case of a bug you have undefined behavior and what you want to achieve with the watchdog is catch this undefined behavior and restart the service - in doubt this will not work in the rare cases the watchdog should restart until you went the complete code-path of a client, in case of a IMAP server you can enter the spin-loop everywhere from accept the connection to folder listing or receive a message and it may depend on a buffer overflow while high concurrency and different threads are touching each other in a unexpected way You're not hearing what I'm saying. Check what's relevant; it's not systemd's concern what you check. I you need to do a transactional check of an IMAP server from a separate process or thread, then do that. Only report back to watchdog if you've verified what you consider to be a healthy state. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Newbie question - Requires doesn't work properly
Hello I am pretty new to systemd. I am trying to write two dependent services Myservice and MyserviceTwo. *Myservice.service* *[Unit]* *Description=This is a test service* *[Service]* *PIDFile=/var/run/Myservice.pid* *ExecStartPre=/bin/rm -f /tmp/log.log* *#ExecStartPre=/usr/bin/systemctl stop Myservice* *ExecStart=/tmp/one.sh* *Restart=on-abort* *NotifyAccess=all* *WatchdogSec=20* *[Install]* *Alias=myservice.services* *MyserviceTwo.service* *[Unit]* *Description=This is a TWO test service* *Requires=Myservice.service* *After=Myservice.service* *[Service]* *PIDFile=/var/run/MyserviceTwo.pid* *#ExecStartPre=/bin/rm -f /tmp/log.log* *ExecStart=/tmp/two.sh* *Restart=on-abort* *NotifyAccess=all* *WatchdogSec=10* *[Install]* *Alias=Salil2.services* When I run systemctl start MyserviceTwo, Myservice also gets started. I have put a systemd-notify command in my scripts *systemd-notify WATCHDOG=1 * I deliberately made the one.sh fail so that Myservice will fail. What I expected is - when Myservice fails, MyserviceTwo also fail. But that didn't happen. following is the output of status command *[root@localhost system]# systemctl status Myservice* *Myservice.service - This is a test service* * Loaded: loaded (/usr/lib/systemd/system/Myservice.service; disabled)* * Active: failed (Result: watchdog) since Thu 2013-11-21 00:21:19 IST; 10min ago* * Process: 3143 ExecStartPre=/bin/rm -f /tmp/log.log (code=exited, status=0/SUCCESS)* * Main PID: 3145* * CGroup: name=systemd:/system/Myservice.service* * ├─3145 /bin/bash /tmp/one.sh* * └─4157 sleep 5* *Nov 21 00:20:59 localhost.localdomain systemd[1]: Starting This is a test service...* *Nov 21 00:20:59 localhost.localdomain systemd[1]: Started This is a test service.* *Nov 21 00:21:19 localhost.localdomain systemd[1]: Myservice.service watchdog timeout!* *Nov 21 00:21:19 localhost.localdomain systemd[1]: Unit Myservice.service entered failed state.* *-* *[root@localhost system]# systemctl status MyserviceTwo* *MyserviceTwo.service - This is a TWO test service* * Loaded: loaded (/usr/lib/systemd/system/MyserviceTwo.service; disabled)* * Active: active (running) since Thu 2013-11-21 00:20:59 IST; 11min ago* * Main PID: 3146 (two.sh)* * CGroup: name=systemd:/system/MyserviceTwo.service* * ├─3146 /bin/bash /tmp/two.sh* * └─4220 sleep 5* *Nov 21 00:20:59 localhost.localdomain systemd[1]: Starting This is a TWO test service...* *Nov 21 00:20:59 localhost.localdomain systemd[1]: Started This is a TWO test service.* Any pointers on how to debug the issue Thanks ~S ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Newbie question - Requires doesn't work properly
On Wed, Nov 20, 2013 at 07:09:13PM +0530, salil GK wrote: Hello I am pretty new to systemd. I am trying to write two dependent services Myservice and MyserviceTwo. *Myservice.service* *[Unit]* *Description=This is a test service* *[Service]* *PIDFile=/var/run/Myservice.pid* *ExecStartPre=/bin/rm -f /tmp/log.log* *#ExecStartPre=/usr/bin/systemctl stop Myservice* *ExecStart=/tmp/one.sh* *Restart=on-abort* *NotifyAccess=all* *WatchdogSec=20* *[Install]* *Alias=myservice.services* *MyserviceTwo.service* *[Unit]* *Description=This is a TWO test service* *Requires=Myservice.service* *After=Myservice.service* *[Service]* *PIDFile=/var/run/MyserviceTwo.pid* *#ExecStartPre=/bin/rm -f /tmp/log.log* That's not nice. *ExecStart=/tmp/two.sh* *Restart=on-abort* *NotifyAccess=all* *WatchdogSec=10* *[Install]* *Alias=Salil2.services* When I run systemctl start MyserviceTwo, Myservice also gets started. I have put a systemd-notify command in my scripts *systemd-notify WATCHDOG=1 * I deliberately made the one.sh fail so that Myservice will fail. What I expected is - when Myservice fails, MyserviceTwo also fail. But that didn't happen. following is the output of status command Requires= is only relevant at service start time. I think BindsTo= should work for you. Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Newbie question - Requires doesn't work properly
The service configuration is strange. Normally, this is how they work with dependencies: * Type=simple considers the service started immediately on exec() with no respect for PIDFiles or sd_notify. This can cause dependent services to come up too early. * Type=forking considers the service started when either the file specified in PIDFile= appears or when the service completes a double fork. * Type=notify is like Type=simple, except that it relies on sd_notify to indicate final startup. * Type=bus is like Type=simple, except that it waits for the dbus listener to indicate final startup. You have PIDFile= specified, which is for Type=forking. I think PIDFile= just gets ignored for Type=simple (the default). So, for one, I'd pick a more coherent startup-detection configuration. More interesting is how Myservice gets marked failed but isn't forcibly stopped. That may be why systemd isn't bringing down dependent services. Most services only get marked failed after stopping (because of a non-zero exit code, for example). ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Newbie question - Requires doesn't work properly
Thanks David and Zbyszek Yes with BindsTo parameter it works David I was trying out the options in the unit file hence PID parameter came in. I forgot to remove that later :-( some more questions in the same thread - 1. How do I trace the heartbeat message - or rather can I capture the systemd-notify messages being sent by the process. 2. Is there any way I can see whether systemd captured the notification or missed it. The reason why I am asking this is - my process runs for some time and all the while it sends WATCHDOG notification in every 5 seconds. But systemd made the process failed saying watchdog timeout. 3. One more use case I can think of is - if the process fail to send heartbeat message ( WATCHDOG ) for some time and later it starts sending - because of some time. So during the time WATCHDOG notification is missing process can be marked as failed and the moment notification start coming, can it be marked as active-running ? 4. Is there way systemd can notify me in case a watchdog timeout happens for a service - like systemd calls some program or write to some socket etc. So basically in case any service fails because of watchdog timeout, I would like to know asynchronously. Is there any way I can configure this. Thanks Salil On 21 November 2013 09:02, David Timothy Strauss da...@davidstrauss.netwrote: The service configuration is strange. Normally, this is how they work with dependencies: * Type=simple considers the service started immediately on exec() with no respect for PIDFiles or sd_notify. This can cause dependent services to come up too early. * Type=forking considers the service started when either the file specified in PIDFile= appears or when the service completes a double fork. * Type=notify is like Type=simple, except that it relies on sd_notify to indicate final startup. * Type=bus is like Type=simple, except that it waits for the dbus listener to indicate final startup. You have PIDFile= specified, which is for Type=forking. I think PIDFile= just gets ignored for Type=simple (the default). So, for one, I'd pick a more coherent startup-detection configuration. More interesting is how Myservice gets marked failed but isn't forcibly stopped. That may be why systemd isn't bringing down dependent services. Most services only get marked failed after stopping (because of a non-zero exit code, for example). ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Newbie question - Requires doesn't work properly
On Thu, Nov 21, 2013 at 4:33 PM, salil GK gksa...@gmail.com wrote: 3. One more use case I can think of is - if the process fail to send heartbeat message ( WATCHDOG ) for some time and later it starts sending - because of some time. So during the time WATCHDOG notification is missing process can be marked as failed and the moment notification start coming, can it be marked as active-running ? I'm not sure, but this would be easy to test. You could also restart on failed watchdog if you want systemd to react. 4. Is there way systemd can notify me in case a watchdog timeout happens for a service - like systemd calls some program or write to some socket etc. So basically in case any service fails because of watchdog timeout, I would like to know asynchronously. Is there any way I can configure this. We typically poll systemctl --failed periodically as part of our monitoring. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Newbie question - Requires doesn't work properly
Thanks David The trick you suggested for point 3 may not work in my case. What happens is - my process may be busy with some other activity during which time it will fail to send periodic message to systemd. After a while it will come out of it's loop and ready to serve. But during this time system would have already marked the process as failed. As the has come back to it's regular working state, I don't want to kill the process or restart the process. As far as clients which are dependent on the service is concerned - service is in good shape but for systemd it is in failed state. so I would like to change the state of the service to active (running) at this time so that my service management framework also would be in good shape. Thanks Salil On 21 November 2013 12:06, David Timothy Strauss da...@davidstrauss.netwrote: On Thu, Nov 21, 2013 at 4:33 PM, salil GK gksa...@gmail.com wrote: 3. One more use case I can think of is - if the process fail to send heartbeat message ( WATCHDOG ) for some time and later it starts sending - because of some time. So during the time WATCHDOG notification is missing process can be marked as failed and the moment notification start coming, can it be marked as active-running ? I'm not sure, but this would be easy to test. You could also restart on failed watchdog if you want systemd to react. 4. Is there way systemd can notify me in case a watchdog timeout happens for a service - like systemd calls some program or write to some socket etc. So basically in case any service fails because of watchdog timeout, I would like to know asynchronously. Is there any way I can configure this. We typically poll systemctl --failed periodically as part of our monitoring. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel