Re: Kannel quits without reporting fatal error

2003-07-30 Thread Alan McNatty
Thanks Alex - have done now. 

On Wed, 2003-07-30 at 21:26, Alexander Malysh wrote:
 Hi Alan,
 
 please , please fill Bug report for you patches and attach your patch to it...
 so your patches will not be loss or at least start new mail thread with 
 [PATCH] in subject...
 
 Thanks in advance...
 
 On Wednesday 30 July 2003 00:16, Alan McNatty wrote:
  Patch to run_kannel_box to log to syslog.
 
  On Wed, 2003-07-09 at 12:54, Andreas Fink wrote:
   On Dienstag, Juli 8, 2003, at 11:13  Uhr, Alan McNatty wrote:
  
   Hello,
  
   run_kannel_box is a strange beast (at least to me). It's
   purpose seems
   unclear (or more appropriately 'undocumented'). I understand
   it's part
   of the utils directory (out of main source) but it is part of
   standard
   build (eg: debian packages).
  
   Andres could you explain a bit about what it's function is ..
   the man
   page is brief. I have yet to review source code in detail -
   but if it's
   supposed to restarting failed services I would be interested
   in doing
   some work/testing with it (potentially adding syslog support
   to begin
   with).
  
   run_kannel_box starts up a process (either bearerbox, webbox orsmsbox)
   and make's sure that if it fails, it gets restartet.
  
   Here's how I start kannel on my computer from
   /etc/rc.d/init.d/kannel.I let the processes run as user kannel at
   the same time:
  
  
   RUN=/var/run/kannel
   BIN=/usr/local/sbin
   DELAY=1
   CONF=/etc/kannel.conf
  
   RUNBOX=${BIN}/run_kannel_box
   BEARERBOX=${BIN}/bearerbox
   SMSBOX=${BIN}/smsbox
   WAPBOX=${BIN}/wapbox
  
   PID_SMSBOX=$RUN/smsbox.pid
   PID_BEARERBOX=$RUN/bearerbox.pid
   PID_WAPBOX=$RUN/wapbox.pid
  
   echo Starting bearerbox
   su -c ${RUNBOX} --pidfile ${PID_BEARERBOX} --min-delay
   ${DELAY}${BEARERBOX} ${CONF}  kannel
   sleep 10
  
   echo Starting smsbox
   su -c ${RUNBOX} --pidfile ${PID_SMSBOX} --min-delay ${DELAY}${SMSBOX}
   ${CONF}  kannel
   sleep 10
  
   echo Starting wapbox
   su -c ${RUNBOX} --pidfile ${PID_WAPBOX} --min-delay ${DELAY}${WAPBOX}
   ${CONF}  kannel
  
  
   if you start a kannel box process with run_kannel_box it will keep
   theprocess restarting if it fails.
   Try it yourself and try to kill any of the box processes, and you
   willsee they restart. pretty much like safe_mysqld in mysql.
   to kill the processes, I kill the process of the PID file above
   whichis the corresponding run_kannel_box process.
  
  
  
  
  
   Andreas Fink
   Global Networks Switzerland AG
  
   --
   Tel: +41-61-333  Fax: +41-61-334   Mobile: +41-79-2457333
   Global Networks, Inc. Clarastrasse 3, 4058 Basel, Switzerland
   Web: http://www.global-networks.ch/  [EMAIL PROTECTED]
   --
-- 
Alan McNatty [EMAIL PROTECTED]



Re: Kannel quits without reporting fatal error

2003-07-29 Thread Alan McNatty
Patch to run_kannel_box to log to syslog. 

On Wed, 2003-07-09 at 12:54, Andreas Fink wrote:
 On Dienstag, Juli 8, 2003, at 11:13  Uhr, Alan McNatty wrote:
 
 Hello,
 
 run_kannel_box is a strange beast (at least to me). It's
 purpose seems
 unclear (or more appropriately 'undocumented'). I understand
 it's part
 of the utils directory (out of main source) but it is part of
 standard
 build (eg: debian packages). 
 
 Andres could you explain a bit about what it's function is ..
 the man
 page is brief. I have yet to review source code in detail -
 but if it's
 supposed to restarting failed services I would be interested
 in doing
 some work/testing with it (potentially adding syslog support
 to begin
 with).  
 
 run_kannel_box starts up a process (either bearerbox, webbox orsmsbox)
 and make's sure that if it fails, it gets restartet.
 
 Here's how I start kannel on my computer from
 /etc/rc.d/init.d/kannel.I let the processes run as user kannel at
 the same time:
 
 
 RUN=/var/run/kannel
 BIN=/usr/local/sbin
 DELAY=1
 CONF=/etc/kannel.conf
 
 RUNBOX=${BIN}/run_kannel_box
 BEARERBOX=${BIN}/bearerbox
 SMSBOX=${BIN}/smsbox
 WAPBOX=${BIN}/wapbox
 
 PID_SMSBOX=$RUN/smsbox.pid
 PID_BEARERBOX=$RUN/bearerbox.pid
 PID_WAPBOX=$RUN/wapbox.pid
 
 echo Starting bearerbox
 su -c ${RUNBOX} --pidfile ${PID_BEARERBOX} --min-delay
 ${DELAY}${BEARERBOX} ${CONF}  kannel
 sleep 10
 
 echo Starting smsbox
 su -c ${RUNBOX} --pidfile ${PID_SMSBOX} --min-delay ${DELAY}${SMSBOX}
 ${CONF}  kannel
 sleep 10
 
 echo Starting wapbox
 su -c ${RUNBOX} --pidfile ${PID_WAPBOX} --min-delay ${DELAY}${WAPBOX}
 ${CONF}  kannel
 
 
 if you start a kannel box process with run_kannel_box it will keep
 theprocess restarting if it fails.
 Try it yourself and try to kill any of the box processes, and you
 willsee they restart. pretty much like safe_mysqld in mysql.
 to kill the processes, I kill the process of the PID file above
 whichis the corresponding run_kannel_box process.
 
 
 
 
 
 Andreas Fink
 Global Networks Switzerland AG
 
 --
 Tel: +41-61-333  Fax: +41-61-334   Mobile: +41-79-2457333
 Global Networks, Inc. Clarastrasse 3, 4058 Basel, Switzerland
 Web: http://www.global-networks.ch/  [EMAIL PROTECTED]
 --
-- 
Alan McNatty [EMAIL PROTECTED]
Index: utils/run_kannel_box.c
===
RCS file: /home/cvs/gateway/utils/run_kannel_box.c,v
retrieving revision 1.7
diff -u -r1.7 run_kannel_box.c
--- utils/run_kannel_box.c	26 Mar 2001 18:48:37 -	1.7
+++ utils/run_kannel_box.c	29 Jul 2003 01:14:54 -
@@ -1,3 +1,14 @@
+/*
+ *  run_kannel_box.c - Kannel box wrapper
+ *
+ *  run_kannel_box starts up a process (either bearerbox, wapbox 
+ *  or smsbox) and make's sure that if it fails, it gets restarted.
+ *  
+ *  Logs to syslog (if possible) with INFO message if a restart
+ *  is required. Logs a ERR message if restart fails.
+ *
+ */
+
 #include stdio.h
 #include stddef.h
 #include stdlib.h
@@ -14,6 +25,24 @@
 #include fcntl.h
 #include signal.h
 
+/* require config to get HAVE_SYSLOG_H */
+#include config.h
+
+#if HAVE_SYSLOG_H
+#include syslog.h
+#else
+
+/*
+ * If we don't have syslog.h, then we'll use the following dummy definitions
+ * to avoid writing #if HAVE_SYSLOG_H everywhere.
+ */
+
+enum { LOG_PID, LOG_DAEMON, LOG_ERR, LOG_INFO };
+static void openlog(const char *ident, int option, int facility) { }
+static void syslog(int translog, const char *buf) { }
+
+#endif
+
 static char *progname;  /* The name of this program (for error messages) */
 static char **box_arglist;
 static int min_restart_delay = 60; /* in seconds */
@@ -205,11 +234,18 @@
  * every time it dies. */
 static int main_loop(char *boxfile)
 {
+	/** A few variables for syslog-ing */
+	int size = 100; /* 100 bytes of log should be fine */
+	char message[size];
+
+	/** Open syslog for internal logging */
+	openlog(progname, LOG_PID, LOG_DAEMON);
+
 	time_t next_fork = 0;
 
-	/* We can't report any errors here, because we are running
-	 * as a daemon and we have no logfile of our own.  So we
-	 * exit with errno as the exit code, to offer a minimal clue. */
+	/* We are running as a daemon and we have no logfile of our 
+	 * own. So we log to syslog if we can and exit with errno 
+	 * as the exit code, to offer a minimal clue. */
 
 	for (;;) {
 
@@ -224,10 +260,18 @@
 
 		child_box = fork();
 		if (child_box  0) {
+			snprintf(message, size, 
+	[ %s ] failed to fork child process - 

Re: Kannel quits without reporting fatal error

2003-07-08 Thread Stipe Tolj
Alan McNatty wrote:
 
 Can anyone think of a reason why Kannel might give up on a bad
 connection after a time (is there something in io_thread in smpp module
 that might cause a shutdown without reporting - I can only see, if fail
 continue, etc - all looks good).

it may be a memory exhausture. Either bearerbox (by looping in the
re-connection state) or other processes have been consuming to much
memory and the OS would deside on which process to drop silently. Just
an idea.

 Alternatively can anyone suggest position of addition debug printouts or
 some external monitoring of processors/threads that I can do that might
 help find the problem. The obvious thing to do is setup test connection
 and simply drop the interface to SMSC and leave running trying to
 connect - just wondering about supplementary logging that might help.

We have a so called 'safe_wrapper' bash script arround the bearerbox
that acts mainly the same was as 'safe_mysqld' for mysqld. Whenever
bearerbox fails by crashing, the script would create a failure log
directory timestamp and 'tail -2000' all logs to that failure log
directory, so admins can at least try to see what has happened.

Usually we have a PANIC in bearerbox that caused to stop. SEGFAULT or
other heavy failures are very rare in the past here at Wapme at least.

Stipe

[EMAIL PROTECTED]
---
Wapme Systems AG

Vogelsanger Weg 80
40470 Düsseldorf

Tel: +49-211-74845-0
Fax: +49-211-74845-299

E-Mail: [EMAIL PROTECTED]
Internet: http://www.wapme-systems.de
---
wapme.net - wherever you are



Re: Kannel quits without reporting fatal error

2003-07-08 Thread Andreas Fink

On Dienstag, Juli 8, 2003, at 09:00  Uhr, Stipe Tolj wrote:

Alan McNatty wrote:
Can anyone think of a reason why Kannel might give up on a bad
connection after a time (is there something in io_thread in smpp module
that might cause a shutdown without reporting - I can only see, if fail
continue, etc - all looks good).

it may be a memory exhausture. Either bearerbox (by looping in the
re-connection state) or other processes have been consuming to much
memory and the OS would deside on which process to drop silently. Just
an idea.

Alternatively can anyone suggest position of addition debug printouts or
some external monitoring of processors/threads that I can do that might
help find the problem. The obvious thing to do is setup test connection
and simply drop the interface to SMSC and leave running trying to
connect - just wondering about supplementary logging that might help.

We have a so called 'safe_wrapper' bash script arround the bearerbox
that acts mainly the same was as 'safe_mysqld' for mysqld. Whenever
bearerbox fails by crashing, the script would create a failure log
directory timestamp> and 'tail -2000' all logs to that failure log
directory, so admins can at least try to see what has happened.


ehmm.. why you use a bash script for that? Kannel has a nice built in feature called

run_kannel_box

which does exactly that.


Andreas Fink
Global Networks Switzerland AG

--
Tel: +41-61-333  Fax: +41-61-334   Mobile: +41-79-2457333
Global Networks, Inc. Clarastrasse 3, 4058 Basel, Switzerland
Web: http://www.global-networks.ch/  [EMAIL PROTECTED]
--



Re: Kannel quits without reporting fatal error

2003-07-08 Thread Alan McNatty
Hello,

run_kannel_box is a strange beast (at least to me). It's purpose seems
unclear (or more appropriately 'undocumented'). I understand it's part
of the utils directory (out of main source) but it is part of standard
build (eg: debian packages). 

Andres could you explain a bit about what it's function is .. the man
page is brief. I have yet to review source code in detail - but if it's
supposed to restarting failed services I would be interested in doing
some work/testing with it (potentially adding syslog support to begin
with).  

On brief code view I did notice from run_kannel_box source (before
for(;;) in main_loop)

/* We can't report any errors here, because we are running
 * as a daemon and we have no logfile of our own.  So we
 * exit with errno as the exit code, to offer a minimal clue. */

for (;;) {

/* Make sure we don't fork in an endless loop if
something
 * is drastically wrong.  This code limits it to one
 * per minute (or whatever min_restart_delay is set to).
*/
time_t this_time = time(NULL);
if (this_time = next_fork) {
sleep(next_fork - this_time);
}
next_fork = this_time + min_restart_delay;

child_box = fork();
if (child_box  0) {
return errno;
}
if (child_box == 0) {
/* child.  exec the box */
execvp(boxfile, box_arglist);
exit(127);
}

while (waitpid(child_box, (int *)NULL, 0) != child_box)
{
if (errno == ECHILD) {
/* Something went wrong... we don't know
what,
 * but we do know that our child does
not
 * exist.  So restart it. */
break;
}
if (errno == EINTR) {
continue;
}
/* Something weird happened. */
return errno;
}
}

This seems to suggest potential for 'vanishing' boxes ('soemthing weird
happened' - Stipe does this tie in with your previous comments?). When I
did a process check - there was also no run_kannel_box running. 

Cheers,
Alan

On Wed, 2003-07-09 at 01:21, Andreas Fink wrote:

 ehmm.. why you use a bash script for that? Kannel has a nice built in
 feature called
 
 
   run_kannel_box
 
 
 which does exactly that.



Re: Kannel quits without reporting fatal error

2003-07-08 Thread Andreas Fink

On Dienstag, Juli 8, 2003, at 11:13  Uhr, Alan McNatty wrote:

Hello,

run_kannel_box is a strange beast (at least to me). It's purpose seems
unclear (or more appropriately 'undocumented'). I understand it's part
of the utils directory (out of main source) but it is part of standard
build (eg: debian packages). 

Andres could you explain a bit about what it's function is .. the man
page is brief. I have yet to review source code in detail - but if it's
supposed to restarting failed services I would be interested in doing
some work/testing with it (potentially adding syslog support to begin
with).  

run_kannel_box starts up a process (either bearerbox, webbox or smsbox) and make's sure that if it fails, it gets restartet.

Here's how I start kannel on my computer from /etc/rc.d/init.d/kannel. I let the processes run as user kannel at the same time:


RUN=/var/run/kannel
BIN=/usr/local/sbin
DELAY=1
CONF=/etc/kannel.conf

RUNBOX=${BIN}/run_kannel_box
BEARERBOX=${BIN}/bearerbox
SMSBOX=${BIN}/smsbox
WAPBOX=${BIN}/wapbox

PID_SMSBOX=$RUN/smsbox.pid
PID_BEARERBOX=$RUN/bearerbox.pid
PID_WAPBOX=$RUN/wapbox.pid

echo Starting bearerbox
su -c ${RUNBOX} --pidfile ${PID_BEARERBOX} --min-delay ${DELAY} ${BEARERBOX} ${CONF}  kannel
sleep 10

echo Starting smsbox
su -c ${RUNBOX} --pidfile ${PID_SMSBOX} --min-delay ${DELAY} ${SMSBOX} ${CONF}  kannel
sleep 10

echo Starting wapbox
su -c ${RUNBOX} --pidfile ${PID_WAPBOX} --min-delay ${DELAY} ${WAPBOX} ${CONF}  kannel


if you start a kannel box process with run_kannel_box it will keep the process restarting if it fails.
Try it yourself and try to kill any of the box processes, and you will see they restart. pretty much like safe_mysqld in mysql.
to kill the processes, I kill the process of the PID file above which is the corresponding run_kannel_box process.





Andreas Fink
Global Networks Switzerland AG

--
Tel: +41-61-333  Fax: +41-61-334   Mobile: +41-79-2457333
Global Networks, Inc. Clarastrasse 3, 4058 Basel, Switzerland
Web: http://www.global-networks.ch/  [EMAIL PROTECTED]
--



Kannel quits without reporting fatal error

2003-07-07 Thread Alan McNatty
Hello,

I have an instance of Kannel running (1.3.1 on debian linux) which
communicates to a single SMSC via SMPP. The SMSC connection in question
can go up and down like a yo-yo. After a recent bout the Kannel smsbox
and bearerbox'es simply vanished without a trace (nothing fatal in
either log).

The logs have the morning SIGHUP when logrotate ran - kannel continued
to log connection failure for the next 1.5 hours then vanished without a
trace (no boxen processors running when I was alerted). Below is a
snippet from the bearerbox logs (set to level 1)... 

Can anyone think of a reason why Kannel might give up on a bad
connection after a time (is there something in io_thread in smpp module
that might cause a shutdown without reporting - I can only see, if fail
continue, etc - all looks good).  

Alternatively can anyone suggest position of addition debug printouts or
some external monitoring of processors/threads that I can do that might
help find the problem. The obvious thing to do is setup test connection
and simply drop the interface to SMSC and leave running trying to
connect - just wondering about supplementary logging that might help.  

All comments/suggestions appreciated. 
Cheers,
Alan


2003-07-07 07:27:13 [0] WARNING: SIGHUP received, catching and
re-opening logs
2003-07-08 07:37:20 [0] WARNING: SIGHUP received, catching and
re-opening logs
2003-07-08 08:15:55 [6] ERROR: SMPP[...]: I/O error or other error.
Re-connecting.
2003-07-08 08:15:55 [5] ERROR: SMPP[...]: I/O error or other error.
Re-connecting.
...
...
2003-07-08 09:44:27 [5] ERROR: error connecting to server `...' at port
`...'
2003-07-08 09:44:27 [6] ERROR: SMPP[...]: Couldn't connect to server.
2003-07-08 09:44:27 [5] ERROR: SMPP[...]: Couldn't connect to server.
2003-07-08 09:44:27 [6] ERROR: SMPP[...]: Couldn't connect to SMS center
(retrying in 10 seconds).
2003-07-08 09:44:27 [5] ERROR: SMPP[...]: Couldn't connect to SMS center
(retrying in 10 seconds).
2003-07-08 09:46:56 [5] ERROR: SMPP[...]: I/O error or other error.
Re-connecting.
2003-07-08 09:46:56 [6] ERROR: SMPP[...]: I/O error or other error.
Re-connecting.
2003-07-08 09:46:56 [5] ERROR: connect failed
2003-07-08 09:46:56 [5] ERROR: System error 111: Connection refused
2003-07-08 09:46:56 [5] ERROR: error connecting to server `...' at port
`...'
2003-07-08 09:46:56 [5] ERROR: SMPP[..]: Couldn't connect to server.
2003-07-08 09:46:56 [5] ERROR: SMPP[..]: Couldn't connect to SMS center
(retrying in 10 seconds).
2003-07-08 09:46:56 [6] ERROR: connect failed
2003-07-08 09:46:56 [6] ERROR: System error 111: Connection refused
2003-07-08 09:46:56 [6] ERROR: error connecting to server `...' at port
'...'
2003-07-08 09:46:56 [6] ERROR: SMPP[..]: Couldn't connect to server.
2003-07-08 09:46:56 [6] ERROR: SMPP[..]: Couldn't connect to SMS center
(retrying in 10 seconds).



-- 
Alan McNatty [EMAIL PROTECTED]