Re: [systemd-devel] Newbie question - Requires doesn't work properly

2013-11-22 Thread Reindl Harald

Am 22.11.2013 03:04, schrieb salil GK:
 Thanks a lot David

 On 22 November 2013 06:44, David Timothy Strauss da...@davidstrauss.net 
 mailto:da...@davidstrauss.net wrote:
 
 On Thu, Nov 21, 2013 at 4:57 PM, salil GK gksa...@gmail.com 
 mailto:gksa...@gmail.com wrote:
  What happens is - my process may be busy with some other activity during
  which time it will fail to send periodic message to systemd. After a 
 while
  it will come out of it's loop and ready to serve. But during this time
  system would have already marked the process as failed.
 
 Then you need to either use another thread, refactor to make a tighter
 event loop, or increase the watchdog time. Drifting in and out of
 tolerance with watchdog is not a safe strategy.

the problem i see with use another thread is that this thread can happily
work and send it's keep alive, but that does not mean at the end that the
service itself is working OK and responsible because both are running
isolated

in case of network services it would be pretty cool if systemd watchdog
could be configured to connect to the service avery n seconds and if
there is no response restart it because this would monitor the real service
without need external tools



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Newbie question - Requires doesn't work properly

2013-11-22 Thread David Timothy Strauss
It is the responsibility of whatever sends the watchdog to ensure
everything's healthy, however necessary. It would be silly to spawn a
thread and have it blindly report health to watchdog. The point is for that
thread to do proper checks but ensure reports go in at the right intervals.
On Nov 22, 2013 7:50 PM, Reindl Harald h.rei...@thelounge.net wrote:


 Am 22.11.2013 03:04, schrieb salil GK:
  Thanks a lot David
 
  On 22 November 2013 06:44, David Timothy Strauss 
  da...@davidstrauss.netmailto:
 da...@davidstrauss.net wrote:
 
  On Thu, Nov 21, 2013 at 4:57 PM, salil GK gksa...@gmail.commailto:
 gksa...@gmail.com wrote:
   What happens is - my process may be busy with some other activity
 during
   which time it will fail to send periodic message to systemd. After
 a while
   it will come out of it's loop and ready to serve. But during this
 time
   system would have already marked the process as failed.
 
  Then you need to either use another thread, refactor to make a
 tighter
  event loop, or increase the watchdog time. Drifting in and out of
  tolerance with watchdog is not a safe strategy.

 the problem i see with use another thread is that this thread can happily
 work and send it's keep alive, but that does not mean at the end that the
 service itself is working OK and responsible because both are running
 isolated

 in case of network services it would be pretty cool if systemd watchdog
 could be configured to connect to the service avery n seconds and if
 there is no response restart it because this would monitor the real service
 without need external tools


 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Newbie question - Requires doesn't work properly

2013-11-22 Thread Reindl Harald


Am 22.11.2013 12:49, schrieb David Timothy Strauss:
 It is the responsibility of whatever sends the watchdog to ensure 
 everything's healthy, however necessary. It would
 be silly to spawn a thread and have it blindly report health to watchdog. The 
 point is for that thread to do proper
 checks but ensure reports go in at the right intervals.

i know that but the *how* is the question

you can internally check what not but that does not mean at
the end of the day the service responds correctly to a
client connection over the network until you do not go
through the same stack meaning doing a network connection

i spent hundrets of hours in upstream-debugging of dbmail
to find spinlocks and what not else only happening in
rare situations, one of them took 16 hours stress tests
until it happend with debug-log enabled while on the
real server it took a few minutes to get triggered
by a random client action

that's the difference between theory and real workload

your internal checks are mostly theory because in case of
a bug you have undefined behavior and what you want to achieve
with the watchdog is catch this undefined behavior and restart
the service - in doubt this will not work in the rare cases
the watchdog should restart until you went the complete
code-path of a client, in case of a IMAP server you can
enter the spin-loop everywhere from accept the connection
to folder listing or receive a message and it may depend
on a buffer overflow while high concurrency and different
threads are touching each other in a unexpected way

been there, died nearly in debug it and catch data for upstream

 On Nov 22, 2013 7:50 PM, Reindl Harald h.rei...@thelounge.net 
 mailto:h.rei...@thelounge.net wrote:
 
 
 Am 22.11.2013 03:04, schrieb salil GK:
  Thanks a lot David
 
  On 22 November 2013 06:44, David Timothy Strauss 
 da...@davidstrauss.net mailto:da...@davidstrauss.net
 mailto:da...@davidstrauss.net mailto:da...@davidstrauss.net wrote:
 
  On Thu, Nov 21, 2013 at 4:57 PM, salil GK gksa...@gmail.com 
 mailto:gksa...@gmail.com
 mailto:gksa...@gmail.com mailto:gksa...@gmail.com wrote:
   What happens is - my process may be busy with some other activity 
 during
   which time it will fail to send periodic message to systemd. 
 After a while
   it will come out of it's loop and ready to serve. But during this 
 time
   system would have already marked the process as failed.
 
  Then you need to either use another thread, refactor to make a 
 tighter
  event loop, or increase the watchdog time. Drifting in and out of
  tolerance with watchdog is not a safe strategy.
 
 the problem i see with use another thread is that this thread can 
 happily
 work and send it's keep alive, but that does not mean at the end that the
 service itself is working OK and responsible because both are running
 isolated
 
 in case of network services it would be pretty cool if systemd watchdog
 could be configured to connect to the service avery n seconds and if
 there is no response restart it because this would monitor the real 
 service
 without need external tools



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Newbie question - Requires doesn't work properly

2013-11-22 Thread David Timothy Strauss
On Fri, Nov 22, 2013 at 10:24 PM, Reindl Harald h.rei...@thelounge.net wrote:
 your internal checks are mostly theory because in case of
 a bug you have undefined behavior and what you want to achieve
 with the watchdog is catch this undefined behavior and restart
 the service - in doubt this will not work in the rare cases
 the watchdog should restart until you went the complete
 code-path of a client, in case of a IMAP server you can
 enter the spin-loop everywhere from accept the connection
 to folder listing or receive a message and it may depend
 on a buffer overflow while high concurrency and different
 threads are touching each other in a unexpected way

You're not hearing what I'm saying. Check what's relevant; it's not
systemd's concern what you check. I you need to do a transactional
check of an IMAP server from a separate process or thread, then do
that. Only report back to watchdog if you've verified what you
consider to be a healthy state.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Newbie question - Requires doesn't work properly

2013-11-20 Thread salil GK
Hello

  I am pretty new to systemd.
  I am trying to write two dependent services Myservice and MyserviceTwo.

*Myservice.service*

*[Unit]*
*Description=This is a test service*

*[Service]*
*PIDFile=/var/run/Myservice.pid*
*ExecStartPre=/bin/rm -f /tmp/log.log*
*#ExecStartPre=/usr/bin/systemctl stop Myservice*
*ExecStart=/tmp/one.sh*
*Restart=on-abort*
*NotifyAccess=all*
*WatchdogSec=20*

*[Install]*
*Alias=myservice.services*


*MyserviceTwo.service*

*[Unit]*
*Description=This is a TWO test service*
*Requires=Myservice.service*
*After=Myservice.service*

*[Service]*
*PIDFile=/var/run/MyserviceTwo.pid*
*#ExecStartPre=/bin/rm -f /tmp/log.log*
*ExecStart=/tmp/two.sh*
*Restart=on-abort*
*NotifyAccess=all*
*WatchdogSec=10*

*[Install]*
*Alias=Salil2.services*


   When I run systemctl start MyserviceTwo, Myservice also gets started.

   I have put a systemd-notify command in my scripts

*systemd-notify WATCHDOG=1 *

   I deliberately made the one.sh fail so that Myservice will fail.

   What I expected is - when Myservice fails, MyserviceTwo also fail. But
that didn't happen. following is the output of status command

*[root@localhost system]# systemctl status Myservice*
*Myservice.service - This is a test service*
*   Loaded: loaded (/usr/lib/systemd/system/Myservice.service; disabled)*
*   Active: failed (Result: watchdog) since Thu 2013-11-21 00:21:19 IST;
10min ago*
*  Process: 3143 ExecStartPre=/bin/rm -f /tmp/log.log (code=exited,
status=0/SUCCESS)*
* Main PID: 3145*
*   CGroup: name=systemd:/system/Myservice.service*
*   ├─3145 /bin/bash /tmp/one.sh*
*   └─4157 sleep 5*

*Nov 21 00:20:59 localhost.localdomain systemd[1]: Starting This is a test
service...*
*Nov 21 00:20:59 localhost.localdomain systemd[1]: Started This is a test
service.*
*Nov 21 00:21:19 localhost.localdomain systemd[1]: Myservice.service
watchdog timeout!*
*Nov 21 00:21:19 localhost.localdomain systemd[1]: Unit Myservice.service
entered failed state.*

*-*

*[root@localhost system]# systemctl status MyserviceTwo*
*MyserviceTwo.service - This is a TWO test service*
*   Loaded: loaded (/usr/lib/systemd/system/MyserviceTwo.service; disabled)*
*   Active: active (running) since Thu 2013-11-21 00:20:59 IST; 11min ago*
* Main PID: 3146 (two.sh)*
*   CGroup: name=systemd:/system/MyserviceTwo.service*
*   ├─3146 /bin/bash /tmp/two.sh*
*   └─4220 sleep 5*

*Nov 21 00:20:59 localhost.localdomain systemd[1]: Starting This is a TWO
test service...*
*Nov 21 00:20:59 localhost.localdomain systemd[1]: Started This is a TWO
test service.*

Any pointers on how to debug the issue

Thanks
~S
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Newbie question - Requires doesn't work properly

2013-11-20 Thread Zbigniew Jędrzejewski-Szmek
On Wed, Nov 20, 2013 at 07:09:13PM +0530, salil GK wrote:
 Hello
 
   I am pretty new to systemd.
   I am trying to write two dependent services Myservice and MyserviceTwo.
 
 *Myservice.service*
 
 *[Unit]*
 *Description=This is a test service*
 
 *[Service]*
 *PIDFile=/var/run/Myservice.pid*
 *ExecStartPre=/bin/rm -f /tmp/log.log*
 *#ExecStartPre=/usr/bin/systemctl stop Myservice*
 *ExecStart=/tmp/one.sh*
 *Restart=on-abort*
 *NotifyAccess=all*
 *WatchdogSec=20*
 
 *[Install]*
 *Alias=myservice.services*
 
 
 *MyserviceTwo.service*
 
 *[Unit]*
 *Description=This is a TWO test service*
 *Requires=Myservice.service*
 *After=Myservice.service*
 
 *[Service]*
 *PIDFile=/var/run/MyserviceTwo.pid*
 *#ExecStartPre=/bin/rm -f /tmp/log.log*
That's not nice.

 *ExecStart=/tmp/two.sh*
 *Restart=on-abort*
 *NotifyAccess=all*
 *WatchdogSec=10*
 
 *[Install]*
 *Alias=Salil2.services*
 
 
When I run systemctl start MyserviceTwo, Myservice also gets started.
 
I have put a systemd-notify command in my scripts
 
 *systemd-notify WATCHDOG=1 *
 
I deliberately made the one.sh fail so that Myservice will fail.
 
What I expected is - when Myservice fails, MyserviceTwo also fail. But
 that didn't happen. following is the output of status command
Requires= is only relevant at service start time. I think BindsTo=
should work for you.

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Newbie question - Requires doesn't work properly

2013-11-20 Thread David Timothy Strauss
The service configuration is strange. Normally, this is how they work
with dependencies:

 * Type=simple considers the service started immediately on exec()
with no respect for PIDFiles or sd_notify. This can cause dependent
services to come up too early.
 * Type=forking considers the service started when either the file
specified in PIDFile= appears or when the service completes a double
fork.
 * Type=notify is like Type=simple, except that it relies on sd_notify
to indicate final startup.
 * Type=bus is like Type=simple, except that it waits for the dbus
listener to indicate final startup.

You have PIDFile= specified, which is for Type=forking. I think
PIDFile= just gets ignored for Type=simple (the default). So, for one,
I'd pick a more coherent startup-detection configuration.

More interesting is how Myservice gets marked failed but isn't
forcibly stopped. That may be why systemd isn't bringing down
dependent services. Most services only get marked failed after
stopping (because of a non-zero exit code, for example).
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Newbie question - Requires doesn't work properly

2013-11-20 Thread salil GK
Thanks David and Zbyszek

Yes with BindsTo parameter it works

David

   I was trying out the options in the unit file hence PID parameter came
in. I forgot to remove that later :-(


some  more questions in the same thread -

1. How do I trace the heartbeat message - or rather can I capture the
systemd-notify messages being sent by the process.
2. Is there any way I can see whether systemd captured the notification
or missed it. The reason why I am asking this is - my process runs for some
time and all the while it sends WATCHDOG notification in every 5 seconds.
But systemd made the process failed saying  watchdog timeout.
3. One more use case I can think of is - if the process fail to send
heartbeat message ( WATCHDOG ) for some time and later it starts sending -
because of some time. So during the time WATCHDOG notification is missing
process can be marked as failed and the moment notification start coming,
can it be marked as active-running ?
4. Is there way systemd can notify me in case a watchdog timeout
happens for a service - like systemd calls some program or write to some
socket etc. So basically in case any service fails because of watchdog
timeout, I would like to know asynchronously. Is there any way I can
configure this.

Thanks
Salil


On 21 November 2013 09:02, David Timothy Strauss da...@davidstrauss.netwrote:

 The service configuration is strange. Normally, this is how they work
 with dependencies:

  * Type=simple considers the service started immediately on exec()
 with no respect for PIDFiles or sd_notify. This can cause dependent
 services to come up too early.
  * Type=forking considers the service started when either the file
 specified in PIDFile= appears or when the service completes a double
 fork.
  * Type=notify is like Type=simple, except that it relies on sd_notify
 to indicate final startup.
  * Type=bus is like Type=simple, except that it waits for the dbus
 listener to indicate final startup.

 You have PIDFile= specified, which is for Type=forking. I think
 PIDFile= just gets ignored for Type=simple (the default). So, for one,
 I'd pick a more coherent startup-detection configuration.

 More interesting is how Myservice gets marked failed but isn't
 forcibly stopped. That may be why systemd isn't bringing down
 dependent services. Most services only get marked failed after
 stopping (because of a non-zero exit code, for example).

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Newbie question - Requires doesn't work properly

2013-11-20 Thread David Timothy Strauss
On Thu, Nov 21, 2013 at 4:33 PM, salil GK gksa...@gmail.com wrote:
 3. One more use case I can think of is - if the process fail to send
 heartbeat message ( WATCHDOG ) for some time and later it starts sending -
 because of some time. So during the time WATCHDOG notification is missing
 process can be marked as failed and the moment notification start coming,
 can it be marked as active-running ?

I'm not sure, but this would be easy to test. You could also restart
on failed watchdog if you want systemd to react.

 4. Is there way systemd can notify me in case a watchdog timeout happens
 for a service - like systemd calls some program or write to some socket etc.
 So basically in case any service fails because of watchdog timeout, I would
 like to know asynchronously. Is there any way I can configure this.

We typically poll systemctl --failed periodically as part of our monitoring.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Newbie question - Requires doesn't work properly

2013-11-20 Thread salil GK
Thanks David

The trick you suggested for point 3 may not work in my case. What happens
is - my process may be busy with some other activity during which time it
will fail to send periodic message to systemd. After a while it will come
out of it's loop and ready to serve. But during this time system would have
already marked the process as failed. As the has come back to it's regular
working state, I don't want to kill the process or restart the process. As
far as clients which are dependent on the service is concerned - service is
in good shape but for systemd it is in failed state. so I would like to
change the state of the service to active (running) at this time so that my
service management framework also would be in good shape.

Thanks
Salil


On 21 November 2013 12:06, David Timothy Strauss da...@davidstrauss.netwrote:

 On Thu, Nov 21, 2013 at 4:33 PM, salil GK gksa...@gmail.com wrote:
  3. One more use case I can think of is - if the process fail to send
  heartbeat message ( WATCHDOG ) for some time and later it starts sending
 -
  because of some time. So during the time WATCHDOG notification is missing
  process can be marked as failed and the moment notification start coming,
  can it be marked as active-running ?

 I'm not sure, but this would be easy to test. You could also restart
 on failed watchdog if you want systemd to react.

  4. Is there way systemd can notify me in case a watchdog timeout
 happens
  for a service - like systemd calls some program or write to some socket
 etc.
  So basically in case any service fails because of watchdog timeout, I
 would
  like to know asynchronously. Is there any way I can configure this.

 We typically poll systemctl --failed periodically as part of our
 monitoring.

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel