Re: [PATCH 0/4] Add info on why process is down to statusfile

2015-01-18 Thread Laurent Bercot


 Hi Olivier !
 Thanks for the patches. A bit of discussion below.



So I wanted to have the return code/signal that occured when a process went down
available on the statusfile, as I believe this is useful information to have for
all services.


 Yes, that makes sense.

 

Looking into it, I noticed that s6 already did gather that info from wait(), and
sends it to ./finish -- that was a nice surprise, but (unless I missed it?)
totally undocumented.


 I was certain I had documented it somewhere, but I must have been wrong, 
because
I couldn't find it. So, thanks for the report, and I'll apply that patch or
update the doc somehow.



First patch fixes the first argument to ./finish, since I assume it was meant to
be able to know whether there was a return code or not, but 255 actually is a
valid return code. Second patch adds a mention of those arguments to the doc.


 Yes, I didn't think too much about that, but in hindsight, since it's possible
to use a code outside of the valid range, it's the right thing to do indeed.
What bothers me, though, is that it breaks existing scripts, so it requires a
major (as in second number in my 4-number versioning system) number change,
loud horns and blinking lights. It can't be the sneaky patch. Totally my
fault, but compatibility breaks are never fun.



Then third patch does add that info to the statusfile


 And here I'm slowing down. I agree it makes sense for the info to be there.
However, wstat is specified as an int, and the mapping of the bits inside
a wstat is totally unspecified, even if most implementations are the same
in practice. That means that wstat cannot be stored as an uint32 in the
status file; if anything, it should be stored as an uint64, to accommodate
archs where int is 64 bits. Adding 8 bytes to the status file is heavy
(all things are relative...) I'll probably end up doing that, but I need to
think about it a bit more first.



and finally as a separate
patch I've also added the signal name to s6-svstat as I think it's a nice thing
to have.


 I'm reluctant to add a number-to-signal-name conversion to a single program.
It really should be a libc function. If anything, I'll add it to skalibs, so
other programs can use the conversion too.
 More importantly, the s6-svstat output is meant to be parsable, and changing
the numbers into signal names makes parsing harder. So there should be at
least an option to deactivate the conversion.

 Next time, please express wishes before sending patches - I like design
discussions and agreement on what to do, I kinda feel trapped or forced when
I get patches, as in I worked hard on this so you better apply them!.
I have to say your patches are pretty much on-point, though, and almost
applicable as is, so thank you for that.

--
 Laurent



[PATCH] svstat: Account for file ready

2015-01-18 Thread Olivier Brunel
Signed-off-by: Olivier Brunel j...@jjacky.com
---
 src/supervision/s6-svstat.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/supervision/s6-svstat.c b/src/supervision/s6-svstat.c
index c986b8d..bc27c32 100644
--- a/src/supervision/s6-svstat.c
+++ b/src/supervision/s6-svstat.c
@@ -44,8 +44,20 @@ int main (int argc, char const *const *argv)
   isup = status.pid  !status.flagfinishing ;
   if (isup)
   {
+struct stat st ;
+unsigned int dirlen = str_len(*argv) ;
+char fn[dirlen + 1 + sizeof(S6_SUPERVISE_READY_FILENAME)] ;
+int ready ;
+byte_copy(fn, dirlen, *argv) ;
+byte_copy(fn + dirlen, 1, /) ;
+byte_copy(fn + dirlen + 1, sizeof(S6_SUPERVISE_READY_FILENAME), 
S6_SUPERVISE_READY_FILENAME) ;
+if (stat(fn, st) == -1)
+  if (errno != ENOENT) strerr_diefu2sys(111, stat , fn) ;
+  else ready = 0 ;
+else ready = 1 ;
 buffer_putnoflush(buffer_1small,up (pid , 8) ;
 buffer_putnoflush(buffer_1small, fmt, uint_fmt(fmt, status.pid)) ;
+if (ready) buffer_putnoflush(buffer_1small,; ready, 7) ;
 buffer_putnoflush(buffer_1small, ) , 2) ;
   }
   else buffer_putnoflush(buffer_1small, down , 5) ;
-- 
2.2.2



[PATCH 3/4] Add info on why process is down to statusfile

2015-01-18 Thread Olivier Brunel
That is, add the wstat allowing to know the return code/signal that
occured when process went down.

Because in addition to those info being provided to the optional finish
script, it is useful to have them always for all services.

Signed-off-by: Olivier Brunel j...@jjacky.com
---
 src/include/s6/s6-supervise.h  |  6 --
 src/libs6/s6_svstatus_pack.c   |  3 ++-
 src/libs6/s6_svstatus_unpack.c |  4 
 src/supervision/s6-supervise.c | 13 +++--
 src/supervision/s6-svstat.c| 20 +++-
 5 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/src/include/s6/s6-supervise.h b/src/include/s6/s6-supervise.h
index c8d72d7..f2b64d8 100644
--- a/src/include/s6/s6-supervise.h
+++ b/src/include/s6/s6-supervise.h
@@ -10,7 +10,7 @@
 #define S6_SVSCAN_CTLDIR .s6-svscan
 #define S6_SVSTATUS_FILENAME S6_SUPERVISE_CTLDIR /status
 #define S6_SUPERVISE_READY_FILENAME S6_SUPERVISE_CTLDIR /ready
-#define S6_SVSTATUS_SIZE 18
+#define S6_SVSTATUS_SIZE 22
 
 extern int s6_svc_write (char const *, char const *, unsigned int) ;
 extern int s6_svc_main (int, char const *const *, char const *, char const *, 
char const *) ;
@@ -24,9 +24,11 @@ struct s6_svstatus_s
   unsigned int flagwantup : 1 ;
   unsigned int flagpaused : 1 ;
   unsigned int flagfinishing : 1 ;
+  unsigned int felldown : 1 ;
+  unsigned int wstat ;
 } ;
 
-#define S6_SVSTATUS_ZERO { .stamp = TAIN_ZERO, .pid = 0, .flagwant = 0, 
.flagwantup = 0, .flagpaused = 0, .flagfinishing = 0 }
+#define S6_SVSTATUS_ZERO { .stamp = TAIN_ZERO, .pid = 0, .flagwant = 0, 
.flagwantup = 0, .flagpaused = 0, .flagfinishing = 0, .felldown = 0, .wstat = 0 
}
 
 
 extern void s6_svstatus_pack (char *, s6_svstatus_t const *) ;
diff --git a/src/libs6/s6_svstatus_pack.c b/src/libs6/s6_svstatus_pack.c
index 2d5baf6..bebbcf9 100644
--- a/src/libs6/s6_svstatus_pack.c
+++ b/src/libs6/s6_svstatus_pack.c
@@ -8,6 +8,7 @@ void s6_svstatus_pack (char *pack, s6_svstatus_t const *sv)
 {
   tain_pack(pack, sv-stamp) ;
   uint32_pack(pack + 12, (uint32)sv-pid) ;
-  pack[16] = sv-flagpaused | (sv-flagfinishing  1) ;
+  pack[16] = sv-flagpaused | (sv-flagfinishing  1) | (sv-felldown  2) ;
   pack[17] = sv-flagwant ? sv-flagwantup ? 'u' : 'd' : 0 ;
+  uint32_pack(pack + 18, (uint32)sv-wstat) ;
 }
diff --git a/src/libs6/s6_svstatus_unpack.c b/src/libs6/s6_svstatus_unpack.c
index cce6989..d7aac26 100644
--- a/src/libs6/s6_svstatus_unpack.c
+++ b/src/libs6/s6_svstatus_unpack.c
@@ -7,11 +7,13 @@
 void s6_svstatus_unpack (char const *pack, s6_svstatus_t_ref sv)
 {
   uint32 pid ;
+  uint32 wstat ;
   tain_unpack(pack, sv-stamp) ;
   uint32_unpack(pack + 12, pid) ;
   sv-pid = (int)pid ;
   sv-flagpaused = pack[16]  1 ;
   sv-flagfinishing = (pack[16]  1)  1 ;
+  sv-felldown = (pack[16]  2)  1 ;
   switch (pack[17])
   {
 case 'u' :
@@ -26,4 +28,6 @@ void s6_svstatus_unpack (char const *pack, s6_svstatus_t_ref 
sv)
   sv-flagwant = 0 ;
   sv-flagwantup = 0 ;
   }
+  uint32_unpack(pack + 18, wstat) ;
+  sv-wstat = (int)wstat ;
 }
diff --git a/src/supervision/s6-supervise.c b/src/supervision/s6-supervise.c
index 1ea45d0..3776be6 100644
--- a/src/supervision/s6-supervise.c
+++ b/src/supervision/s6-supervise.c
@@ -192,6 +192,7 @@ static void trystart (void)
   settimeout_infinite() ;
   state = UP ;
   status.pid = pid ;
+  status.felldown = 0 ;
   tain_copynow(status.stamp) ;
   announce() ;
   ftrigw_notify(S6_SUPERVISE_EVENTDIR, 'u') ;
@@ -230,7 +231,7 @@ static void down_d (void)
   announce() ;
 }
 
-static void tryfinish (int wstat, int islast)
+static void tryfinish (int islast)
 {
   register pid_t pid = fork() ;
   if (pid  0)
@@ -248,8 +249,8 @@ static void tryfinish (int wstat, int islast)
 char fmt1[UINT_FMT] ;
 char *cargv[4] = { finish, fmt0, fmt1, 0 } ;
 selfpipe_finish() ;
-fmt0[uint_fmt(fmt0, WIFSIGNALED(wstat) ? 256 : WEXITSTATUS(wstat))] = 0 ;
-fmt1[uint_fmt(fmt1, WTERMSIG(wstat))] = 0 ;
+fmt0[uint_fmt(fmt0, WIFSIGNALED(status.wstat) ? 256 : 
WEXITSTATUS(status.wstat))] = 0 ;
+fmt1[uint_fmt(fmt1, WTERMSIG(status.wstat))] = 0 ;
 if (flagsetsid) setsid() ;
 execve(./finish, cargv, (char *const *)environ) ;
 _exit(111) ;
@@ -268,14 +269,14 @@ static void uptimeout (void)
 
 static void uplastup_z (int islast)
 {
-  int wstat = status.pid ;
   status.pid = 0 ;
+  status.felldown = 1 ;
   tain_copynow(status.stamp) ;
   announce() ;
   ftrigw_notify(S6_SUPERVISE_EVENTDIR, 'd') ;
   if (unlink(S6_SUPERVISE_READY_FILENAME)  0  errno != ENOENT)
 strerr_warnwu1sys(unlink  S6_SUPERVISE_READY_FILENAME) ;
-  tryfinish(wstat, islast) ;
+  tryfinish(islast) ;
 }
 
 static void up_z (void)
@@ -401,7 +402,7 @@ static void handle_signals (void)
 if (errno != ECHILD) strerr_diefu1sys(111, wait_pid_nohang) ;
 else break ;
   else if (!r) break ;
-  status.pid = wstat ;
+  if (!status.flagfinishing) status.wstat = wstat ;
   (*actions[state][V_CHLD])() ;
 }
 

[PATCH 4/4] svstat: Show signal name as well

2015-01-18 Thread Olivier Brunel
Signed-off-by: Olivier Brunel j...@jjacky.com
---
 src/supervision/s6-svstat.c | 93 -
 1 file changed, 92 insertions(+), 1 deletion(-)

diff --git a/src/supervision/s6-svstat.c b/src/supervision/s6-svstat.c
index b98e69e..766d04c 100644
--- a/src/supervision/s6-svstat.c
+++ b/src/supervision/s6-svstat.c
@@ -68,8 +68,99 @@ int main (int argc, char const *const *argv)
   buffer_putnoflush(buffer_1small, (, 1) ;
   if (WIFSIGNALED(status.wstat))
   {
+struct {
+  const char *name ;
+  int num ;
+} signals[] = {
+{ HUP,SIGHUP },
+{ INT,SIGINT },
+{ QUIT,   SIGQUIT },
+{ ILL,SIGILL },
+#ifdef SIGTRAP
+{ TRAP,   SIGTRAP },
+#endif
+{ ABRT,   SIGABRT },
+#ifdef SIGIOT
+{ IOT,SIGIOT },
+#endif
+#ifdef SIGEMT
+{ EMT,SIGEMT },
+#endif
+#ifdef SIGBUS
+{ BUS,SIGBUS },
+#endif
+{ FPE,SIGFPE },
+{ KILL,   SIGKILL },
+{ USR1,   SIGUSR1 },
+{ SEGV,   SIGSEGV },
+{ USR2,   SIGUSR2 },
+{ PIPE,   SIGPIPE },
+{ ALRM,   SIGALRM },
+{ TERM,   SIGTERM },
+#ifdef SIGSTKFLT
+{ STKFLT, SIGSTKFLT },
+#endif
+{ CHLD,   SIGCHLD },
+#ifdef SIGCLD
+{ CLD,SIGCLD },
+#endif
+{ CONT,   SIGCONT },
+{ STOP,   SIGSTOP },
+{ TSTP,   SIGTSTP },
+{ TTIN,   SIGTTIN },
+{ TTOU,   SIGTTOU },
+#ifdef SIGURG
+{ URG,SIGURG },
+#endif
+#ifdef SIGXCPU
+{ XCPU,   SIGXCPU },
+#endif
+#ifdef SIGXFSZ
+{ XFSZ,   SIGXFSZ },
+#endif
+#ifdef SIGVTALRM
+{ VTALRM, SIGVTALRM },
+#endif
+#ifdef SIGPROF
+{ PROF,   SIGPROF },
+#endif
+#ifdef SIGWINCH
+{ WINCH,  SIGWINCH },
+#endif
+#ifdef SIGIO
+{ IO, SIGIO },
+#endif
+#ifdef SIGPOLL
+{ POLL,   SIGPOLL },
+#endif
+#ifdef SIGINFO
+{ INFO,   SIGINFO },
+#endif
+#ifdef SIGLOST
+{ LOST,   SIGLOST },
+#endif
+#ifdef SIGPWR
+{ PWR,SIGPWR },
+#endif
+#ifdef SIGUNUSED
+{ UNUSED, SIGUNUSED },
+#endif
+#ifdef SIGSYS
+{ SYS,SIGSYS },
+#endif
+} ;
+int i ;
+int l = sizeof(signals) / sizeof(signals[0]) ;
+int sig = WTERMSIG(status.wstat) ;
+
 buffer_putnoflush(buffer_1small, signal=, 7) ;
-buffer_putnoflush(buffer_1small, fmt, uint_fmt(fmt, 
WTERMSIG(status.wstat))) ;
+buffer_putnoflush(buffer_1small, fmt, uint_fmt(fmt, sig)) ;
+for (i = 0; i  l; i++) if (signals[i].num == sig) break ;
+if (i  l)
+{
+  buffer_putnoflush(buffer_1small, :SIG, 4) ;
+  buffer_putnoflush(buffer_1small, signals[i].name, 
str_len(signals[i].name)) ;
+}
   }
   else
   {
-- 
2.2.2



[PATCH 2/4] supervise: Document arguments to ./finish

2015-01-18 Thread Olivier Brunel
Signed-off-by: Olivier Brunel j...@jjacky.com
---
 doc/s6-supervise.html | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/s6-supervise.html b/doc/s6-supervise.html
index b080626..dca437c 100644
--- a/doc/s6-supervise.html
+++ b/doc/s6-supervise.html
@@ -45,6 +45,8 @@ If it already exists, it uses it as is, without modifying the 
subscription right
 successfully spawns tt./run/tt. /li
  li When tt./run/tt dies, s6-supervise sends a tt'd'/tt event to 
tt./event/tt. /li
  li When tt./run/tt dies, s6-supervise spawns tt./finish/tt if it 
exists. /li
+ li tt./finish/tt will have the return code as first argument, or 256 if
+tt./run/tt was signaled, and the signal number as second argument. /li
  li tt./finish/tt must exit in less than 5 seconds. If it takes more 
than that,
 s6-supervise kills it. /li
  li When tt./finish/tt dies, s6-supervise restarts tt./run/tt unless 
it has been
-- 
2.2.2