inn-workers Digest, Vol 65, Issue 8

inn-workers-request Thu, 30 Oct 2014 05:00:43 -0700

Send inn-workers mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.isc.org/mailman/listinfo/inn-workers
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of inn-workers digest..."

Today's Topics:

   1. INN 2.5.4 strange crash (Petr Novopashenniy)
   2. Re: INN 2.5.4 strange crash (Julien ?LIE)
   3. Re: INN 2.5.4 strange crash (Petr Novopashenniy)
   4. Re: INN 2.5.4 strange crash (Russ Allbery)

----------------------------------------------------------------------

Message: 1
Date: Wed, 29 Oct 2014 16:12:27 +0300 (MSK)
From: Petr Novopashenniy <[email protected]>
To: [email protected]
Subject: INN 2.5.4 strange crash
Message-ID: <[email protected]>
Content-Type: TEXT/PLAIN; format=flowed; charset=KOI8-R

Good day, All!

I have INN 2.5.4 on old FreeBSD box.
After upgrade from 2.4.3, innd sometimes began to segfault.

This happens when news.daily is running.
This is gdb output from some crashes:

1)
Program terminated with signal 11, Segmentation fault.

(gdb) bt
#0  0x0 in ?? ()
#1  0x8057477 in CHANreadloop () at chan.c:1284
#2  0x8059100 in KEYgenerate (header=0x2, body=0xbfbfed98 "\220???????", 
bodylen=3217026468) at innd.c:750
#3  0x804d8de in _start ()

2)
(gdb) bt
#0  0x0 in ?? ()
#1  0x8057477 in CHANreadloop () at chan.c:1284
#2  0x8059100 in KEYgenerate (header=0x1, body=0xbfbfeda0 "\224???", 
bodylen=3217026472) at innd.c:750
#3  0x804d8de in _start ()

3)
(gdb) bt
#0  0x0 in ?? ()
#1  0x8057477 in CHANreadloop () at chan.c:1284
#2  0x8059100 in KEYgenerate (header=0x1, body=0xbfbfeda0 "\224???", 
bodylen=3217026472) at innd.c:750
#3  0x804d8de in _start ()

Nothing useful for me. ;(

In "Daily Usenet report" mail, I see that innstat from news.daily is 
working fine, but after that something goes wrong:

###
[dd]

news.etla.org                         1  (152)

TOTAL: 40                           121

ctlinnd: cannot send "logmode" command (dead server failure): No such 
process
ctlinnd: cannot send "go" command (dead server failure): No such process
Error log:
innd: innfeed-neva2!:25:proc:53280 blocked sleeping 5
innd: innfeed-neva1!:24:proc:53279 blocked sleeping 5
---------

[dd]
###

Last core creation time is Oct 29 03:20:19 2014, and looks like this is 
scanlogs problem. (I have no DOGROUPBASEEXPIRY=true, and I see sleep 30 in 
line 349 of news.daily, "news.daily noexpireover" start at 3:20 by cron)

Command "ctlinnd -s pause "Flushing log and syslog files" from line 114 
looks like work fine, and I see this message in news.notice. But I suspect 
that "ctlinnd flushlogs" from line 118 is crash innd.

Result of next commands "ctlinnd -s logmode" (line 141) and "ctlinnd -s 
go "Flushing log and syslog files" (line 177) we can see from the above 
example.

But I not see "Cannot flush logs" message.

Has anyone have such problem?

Petr Novopashenniy
RUSnet

------------------------------

Message: 2
Date: Wed, 29 Oct 2014 15:47:18 +0100
From: Julien ?LIE <[email protected]>
To: [email protected]
Subject: Re: INN 2.5.4 strange crash
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hi Petr,

> I have INN 2.5.4 on old FreeBSD box.
> After upgrade from 2.4.3, innd sometimes began to segfault.

How often does innd segfault?

> Last core creation time is Oct 29 03:20:19 2014, and looks like this
> is scanlogs problem.
> 
> Command "ctlinnd -s pause "Flushing log and syslog files" from line
> 114 looks like work fine, and I see this message in news.notice. But I
> suspect that "ctlinnd flushlogs" from line 118 is crash innd.
> 
> Has anyone have such problem?

I do not see that problem on my news server.
Maybe it comes from the following patch:
     https://inn.eyrie.org/trac/changeset/9463/trunk/innd/cc.c

   Rotate innfeed logs
     Exploder and process channels are now properly reopened when 
"ctlinnd
     flushlogs" is used, which is in particular the command invoked by
     scanlogs to rotate log files.

--- a/trunk/innd/cc.c
+++ b/trunk/innd/cc.c
@@ -630,9 +630,12 @@

  /*
-**  Flush the log files.
+**  Flush the log files as well as exploder and process channels.
  */
  static const char *
  CCflushlogs(char *unused[])
  {
+    SITE        *sp;
+    CHANNEL     *cp;
+    int         i;
      unused = unused;            /* ARGSUSED */

@@ -644,4 +647,13 @@
      ReopenLog(Log);
      ReopenLog(Errlog);
+    /* Flush exploder and process channels so that they take into 
account
+     * the new log files (for instance during log rotation). */
+    for (sp = Sites, i = nSites; --i >= 0; sp++) {
+        if (((cp = sp->Channel) != NULL)
+             && ((cp->Type == CTexploder) || (cp->Type == CTprocess))) 
{
+            SITEflush(sp, true);
+            syslog(L_NOTICE, "%s flush", sp->Name);
+        }
+    }
      return NULL;
  }

Would it happen that we're flushing a channel we should not at that very 
time?
A race-condition somewhere?

Or that the syslog() line should be put before SITEflush() because 
sp->Name
is no longer valid?  The gdb trace should have shown it, had it been the 
case.
I see "0x0 in ?? ()" in your gdb trace so it seems there is somewhere a 
NULL pointer.
Is your INN built with optimizations?  It sometimes prevents gdb from 
giving
useful information; in that case, you should try to rebuild INN with for 
instance
"-g -O0 -fno-inline".

In your message, the backtrace also hints at line 1284 of chan.c:
     (*cp->Waker)(cp);
but I do not see why an error would happen here.
*cp->Waker is either SITEspoolwake or CHANwakeup but I do not see
what could be NULL there.

Note that there have been tons of changes since INN 2.4.3 so the issue 
can be
totally elsewhere than the recent commit I had in mind.
A complete backtrace from gdb would be of help.

-- 
Julien ?LIE

------------------------------

Message: 3
Date: Wed, 29 Oct 2014 18:28:16 +0300 (MSK)
From: Petr Novopashenniy <[email protected]>
To: [email protected]
Subject: Re: INN 2.5.4 strange crash
Message-ID: <[email protected]>
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Wed, 29 Oct 2014, Julien ?LIE wrote:

J?? Hi Petr,
J?? 
J?? > I have INN 2.5.4 on old FreeBSD box.
J?? > After upgrade from 2.4.3, innd sometimes began to segfault.
J?? 
J?? How often does innd segfault?

I upgrade INN at 9 september, and this is result:

-rw-------  1 news  news  187858944 Sep 14 03:20 innd.core1
-rw-------  1 news  news  193404928 Sep 16 03:20 innd.core2
-rw-------  1 news  news  189050880 Oct 27 02:20 innd.core3
-rw-------  1 news  news  200908800 Oct 28 02:20 innd.core4
-rw-------  1 news  news  196829184 Oct 29 03:20 innd.core5

J?? 
J?? 
J?? 
J?? > Last core creation time is Oct 29 03:20:19 2014, and looks like this
J?? > is scanlogs problem.
J?? > 
J?? > Command "ctlinnd -s pause "Flushing log and syslog files" from line
J?? > 114 looks like work fine, and I see this message in news.notice. But I
J?? > suspect that "ctlinnd flushlogs" from line 118 is crash innd.
J?? > 
J?? > Has anyone have such problem?
J?? 
J?? I do not see that problem on my news server.
J?? Maybe it comes from the following patch:
J??     https://inn.eyrie.org/trac/changeset/9463/trunk/innd/cc.c
J?? 
J??   Rotate innfeed logs
J??     Exploder and process channels are now properly reopened when "ctlinnd
J??     flushlogs" is used, which is in particular the command invoked by
J??     scanlogs to rotate log files.
J?? 
J?? --- a/trunk/innd/cc.c
J?? +++ b/trunk/innd/cc.c
J?? @@ -630,9 +630,12 @@
J?? 
J??  /*
J?? -**  Flush the log files.
J?? +**  Flush the log files as well as exploder and process channels.
J??  */
J??  static const char *
J??  CCflushlogs(char *unused[])
J??  {
J?? +    SITE        *sp;
J?? +    CHANNEL     *cp;
J?? +    int         i;
J??      unused = unused;            /* ARGSUSED */
J?? 
J?? @@ -644,4 +647,13 @@
J??      ReopenLog(Log);
J??      ReopenLog(Errlog);
J?? +    /* Flush exploder and process channels so that they take into account
J?? +     * the new log files (for instance during log rotation). */
J?? +    for (sp = Sites, i = nSites; --i >= 0; sp++) {
J?? +        if (((cp = sp->Channel) != NULL)
J?? +             && ((cp->Type == CTexploder) || (cp->Type == CTprocess))) {
J?? +            SITEflush(sp, true);
J?? +            syslog(L_NOTICE, "%s flush", sp->Name);
J?? +        }
J?? +    }
J??      return NULL;
J??  }
J?? 
J?? 
J?? 
J?? Would it happen that we're flushing a channel we should not at that very
J?? time?
J?? A race-condition somewhere?

Looks like it.
But my other servers with 2.5.4 (and other OS version) never had such 
an error.

J?? 
J?? Or that the syslog() line should be put before SITEflush() because sp->Name
J?? is no longer valid?  The gdb trace should have shown it, had it been the
J?? case.
J?? I see "0x0 in ?? ()" in your gdb trace so it seems there is somewhere a NULL
J?? pointer.
J?? Is your INN built with optimizations?  It sometimes prevents gdb from giving
J?? useful information; in that case, you should try to rebuild INN with for
J?? instance
J?? "-g -O0 -fno-inline".
J?? 
J?? 
J?? In your message, the backtrace also hints at line 1284 of chan.c:
J??     (*cp->Waker)(cp);
J?? but I do not see why an error would happen here.
J?? *cp->Waker is either SITEspoolwake or CHANwakeup but I do not see
J?? what could be NULL there.
J?? 
J?? 
J?? Note that there have been tons of changes since INN 2.4.3 so the issue can
J?? be
J?? totally elsewhere than the recent commit I had in mind.
J?? A complete backtrace from gdb would be of help.

I rebuild INN with "-g -O0 -fno-inline", and waiting new segfaults.

Thanks, Julien!

--pety

------------------------------

Message: 4
Date: Wed, 29 Oct 2014 09:10:44 -0700
From: Russ Allbery <[email protected]>
To: [email protected]
Subject: Re: INN 2.5.4 strange crash
Message-ID: <[email protected]>
Content-Type: text/plain; charset=utf-8

Petr Novopashenniy <[email protected]> writes:

> 1)
> Program terminated with signal 11, Segmentation fault.

> (gdb) bt
> #0  0x0 in ?? ()
> #1  0x8057477 in CHANreadloop () at chan.c:1284
> #2  0x8059100 in KEYgenerate (header=0x2, body=0xbfbfed98 "\220???????",
> bodylen=3217026468) at innd.c:750
> #3  0x804d8de in _start ()

Julien already basically said this, but to note explicitly, this is a
nonsensical backtrace.  KEYgenerate does not call CHANreadloop, and that
line number is actually in the middle of main().  That makes me think
something may be hammering the stack, although it could just be compiler
optimizations.

Unfortunately, valgrind is the best tool for things like this, but it will
make your server incredibly slow, and given how unusual the crashes are,
may not be viable to run for long enough to see the problem.

A crash without optimizations turned on should be helpful, though.

-- 
Russ Allbery ([email protected])              <http://www.eyrie.org/~eagle/>

    Please send questions to the list rather than mailing me directly.
     <http://www.eyrie.org/~eagle/faqs/questions.html> explains why.

------------------------------

_______________________________________________
inn-workers mailing list
[email protected]
https://lists.isc.org/mailman/listinfo/inn-workers

End of inn-workers Digest, Vol 65, Issue 8
******************************************

inn-workers Digest, Vol 65, Issue 8

Reply via email to