Send inn-workers mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.isc.org/mailman/listinfo/inn-workers
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of inn-workers digest..."


Today's Topics:

   1. Re: INN 2.5.4 strange crash (Petr Novopashenniy)
   2. Re: INN 2.5.4 strange crash (Julien ?LIE)


----------------------------------------------------------------------

Message: 1
Date: Wed, 4 Feb 2015 17:52:26 +0300 (MSK)
From: Petr Novopashenniy <[email protected]>
To: Julien ?LIE <[email protected]>
Cc: [email protected]
Subject: Re: INN 2.5.4 strange crash
Message-ID: <[email protected]>
Content-Type: TEXT/PLAIN; charset=US-ASCII



On Tue, 3 Feb 2015, Julien ?LIE wrote:

[dd]

J?? > I saw "blocked sleeping" before (for my two channels, innfeed-neva1 and
J?? > innfeed-neva2 at varios times), but not in the time of crash.
J?? 
J?? They do not seem related then.
J?? 

May be..

[dd]

J?? When you have such logs, please paste them so as to check the closing is
J?? fine.
J?? 

I looked more closely my logs, and this is partial information from them:

Feb  4 08:34:33 ns innd: innfeed-neva1!:26:proc:15113 cant write: Resource 
temporarily unavailable
Feb  4 08:34:33 ns innd: innfeed-neva1!:26:proc:15113 blocked sleeping 5
[dd]
Feb  4 08:34:34 ns innd: innfeed-neva1!:26:proc:15113 cant write: Resource 
temporarily unavailable
Feb  4 08:34:34 ns innd: innfeed-neva1!:26:proc:15113 blocked sleeping 5
[dd]
Feb  4 08:34:39 ns innd: innfeed-neva1!:26:proc:15113 wakeup
Feb  4 08:34:39 ns last message repeated 33 times
[dd]
Feb  4 08:34:39 ns innd: innfeed-neva1!:26:proc:15113 wakeup
Feb  4 08:34:39 ns last message repeated 85 times

$ cat news.notice |grep " wakeup"|wc -l
   11103
$

And EAGAIN only twice.

No "sleeping without Waker" messages.

I not see messages with "wakeup" and EAGAIN (temporarily unavailable) in 
errlog and news.err, it surprises me.

Continued logs:

We receive control message:

Feb  4 10:32:46 ns innd: innfeed-neva1!:26:proc:15113 wakeup
Feb  4 10:32:46 ns last message repeated 55 times
Feb  4 10:32:46 ns controlchan[15112]: control_newgroup, 
alt.binaries.history [email protected] [email protected] 
@03006E657773310000000003051400065DD3
@, , , UTF-8, neva-out.feeder.erje.net, doit, 1 
Feb  4 10:32:46 ns innd: ctlinnd command 
k:alt.binaries.history:y:[email protected]
Feb  4 10:32:47 ns innd: controlchan! closed
Feb  4 10:32:47 ns innd: innfeed-neva1! closed
Feb  4 10:32:47 ns innfeed[15113]: ME source lost . Exiting

[dd]

Feb  4 10:33:08 ns innd: controlchan! exit 0 elapsed 14017 pid 15112
Feb  4 10:33:08 ns innd: controlchan!:26:proc:26571 26 sleeping without 
Waker
Feb  4 10:33:08 ns innd: controlchan!:26:proc:26571 closed
Feb  4 10:33:08 ns innd: controlchan! exit 0 elapsed 0 pid 26571
Feb  4 10:33:08 ns innd: controlchan!:99:file closed
Feb  4 10:33:09 ns innd: controlchan! spawned controlchan!:99:proc:26578
Feb  4 10:33:09 ns innd: controlchan! restarted

[dd]

Feb  4 10:33:09 ns innd: goblin1.stu.neva.ru:26 26 sleeping without Waker
Feb  4 10:33:09 ns innd: goblin1.stu.neva.ru:26 checkpoint seconds 0 
accepted 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size 
0 rejected size 0
Feb  4 10:33:09 ns innd: goblin1.stu.neva.ru:26 closed seconds 0 accepted 
0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size 0 
rejected size 0
Feb  4 10:33:09 ns innd: goblin3.stu.neva.ru:26 26 sleeping without Waker
Feb  4 10:33:09 ns innd: goblin3.stu.neva.ru:26 checkpoint seconds 0 
accepted 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size 
0 rejected size 0
Feb  4 10:33:09 ns innd: goblin3.stu.neva.ru:26 closed seconds 0 accepted 
0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size 0 
rejected size 0

[and same for several peers]

Feb  4 10:33:10 ns innd: free:-1 26 free but was in SMASK

And no errors after that.
Looks like problem with my system, I see same fd 26.

In my old logs, I see different fd numbers, but problem is same.

--pety



------------------------------

Message: 2
Date: Wed, 04 Feb 2015 21:07:06 +0100
From: Julien ?LIE <[email protected]>
To: [email protected]
Subject: Re: INN 2.5.4 strange crash
Message-ID: <[email protected]>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hi Petr,

> Feb  4 08:34:33 ns innd: innfeed-neva1!:26:proc:15113 cant write: Resource
> temporarily unavailable
> Feb  4 08:34:33 ns innd: innfeed-neva1!:26:proc:15113 blocked sleeping 5
> [dd]
> Feb  4 08:34:34 ns innd: innfeed-neva1!:26:proc:15113 cant write: Resource
> temporarily unavailable
> Feb  4 08:34:34 ns innd: innfeed-neva1!:26:proc:15113 blocked sleeping 5

Strange that the log is written twice (the second one should have been 
written at least 5 seconds later).


> Feb  4 08:34:39 ns innd: innfeed-neva1!:26:proc:15113 wakeup
> Feb  4 08:34:39 ns last message repeated 33 times
> [dd]
> Feb  4 08:34:39 ns innd: innfeed-neva1!:26:proc:15113 wakeup
> Feb  4 08:34:39 ns last message repeated 85 times

A lot of wake up at the same second...


> Feb  4 10:33:08 ns innd: controlchan!:26:proc:26571 26 sleeping without
> Waker
> Feb  4 10:33:08 ns innd: controlchan!:26:proc:26571 closed
> Feb  4 10:33:08 ns innd: controlchan! exit 0 elapsed 0 pid 26571
> Feb  4 10:33:08 ns innd: controlchan!:99:file closed
> Feb  4 10:33:09 ns innd: controlchan! spawned controlchan!:99:proc:26578
> Feb  4 10:33:09 ns innd: controlchan! restarted

That's the behaviour of the patch:  it closes the channel if found in an 
unexpected state.
Seems fine here.


> Feb  4 10:33:09 ns innd: goblin1.stu.neva.ru:26 26 sleeping without Waker
> Feb  4 10:33:09 ns innd: goblin1.stu.neva.ru:26 checkpoint seconds 0
> accepted 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size
> 0 rejected size 0
> Feb  4 10:33:09 ns innd: goblin1.stu.neva.ru:26 closed seconds 0 accepted
> 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size 0
> rejected size 0
> Feb  4 10:33:09 ns innd: goblin3.stu.neva.ru:26 26 sleeping without Waker
> Feb  4 10:33:09 ns innd: goblin3.stu.neva.ru:26 checkpoint seconds 0
> accepted 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size
> 0 rejected size 0
> Feb  4 10:33:09 ns innd: goblin3.stu.neva.ru:26 closed seconds 0 accepted
> 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size 0
> rejected size 0
>
> [and same for several peers]
>
> Feb  4 10:33:10 ns innd: free:-1 26 free but was in SMASK
>
> And no errors after that.
> Looks like problem with my system, I see same fd 26.

It should normally be fine as long as goblin1.stu.neva.ru:26 is closed 
before goblin3.stu.neva.ru:26, and so on.

What I do not explain is why the sleeping state of these channels is not 
removed by the call of SCHANremove(cp); it sounds as though 
CHANsleeping(cp) were false when entering into SCHANremove.

Maybe you could change

void
SCHANremove(CHANNEL *cp)
{
     if (!CHANsleeping(cp))
         return;

to

void
SCHANremove(CHANNEL *cp)
{
     if (!CHANsleeping(cp)) {
         notice("%s was not sleeping", CHANname(cp));
         return;
     }

so as to verify that SCHANremove does the right thing when called after 
a "sleeping without Waker" warning?

At least fd 26 is freed at 10:33:10, which solves the issue you are 
facing.  But I also do not explain why it had not been freed sooner...

-- 
Julien ?LIE

? Ce qui m'int?resse chez une femme, c'est ce que je n'ose pas lui
   demander. ? (Wolinski)


------------------------------

_______________________________________________
inn-workers mailing list
[email protected]
https://lists.isc.org/mailman/listinfo/inn-workers

End of inn-workers Digest, Vol 69, Issue 3
******************************************

Reply via email to