Re: Any acceptance of TCP connections suddenly BREAKS without provocation globally: gives 'connection reset by peer', 'connection refused' or blocks. Only solved by reboot. Details&analysis provided. Request for pointers as to nail down the error source.

Kenneth R Westerback Sun, 07 Jul 2013 14:18:08 -0700

On Sun, Jul 07, 2013 at 10:28:50PM +0300, Mikael wrote:
> Hi Kenneth!
> 
> Thank you for your response.
> 
> The machine was not under attack, but there has been some fishing going on,
> which OBSD is designed to work very well with. Maybe just some software out
> there statically configured to call on the IP my machine happens to have.
> 
> I do not really see any reasons as to why a previously crashed non-root
> egdb session and a non-root user application reaching its resident memory
> size quota as to make malloc() in it return NULL would be causal to this
> error, though mentioned it for completeness.
> 
> On the second occurrence, I had a defunct process that had a TCP port bound
> and kill -9 would not take it down. Not too many connect attempts should
> have been made to this port after the process went defunct. Though again, I
> don't see that this would be causal to a system-wide problem like this
> error. Also I don't recall if there was a defunct process around when the
> error happened 2013-05-03.
> 
> 
> The PNG file is uploaded here:
> http://s000.tinyupload.com/download.php?file_id=28266514139859296465&t=2826651413985929646525648
> ,
> please have a look!


Please transcribe the text and include in the email. Following links to
download files to somehow display to read some text is a waste of time. Or
capture the output in a useful format. i.e. text.

> 
> 
> Thank you for the reference to http://openbsd.org/report.html - read
> through it again once more now.

And yet you don't actually provide any of the information requested on
the page.

> 
> What I have encountered is measurably a bug since what's happening is not
> OBSD's intended behavior, so addressing it here makes sense.

Not knowing what causes the behaviour I would not be so quick to decide
what OBSD's intended behaviour in that circumstance is. I can imagine
circumstances where your described behaviour would be a good thing.

> 
> It's not nailed down already but it is a real bug. One way to resolve it
> would be by knowing exactly how to reproduce it, that's an ordinary and
> practical-easy way, another one would be to know the code paths exactly and
> reason yourself to where the bug needs to be by implication, and another
> one would be to dig out more debug info as to get data that nails or almost
> nails it down.
> 
> Right now more data would make the most sense I guess.

See point above about report.html.

> 
> Perhaps the data already provided in the PNG file (above) is enough, you
> probably know this better than me

Until I see the text that would be a heroic assumption of precognitive
abilities on my part!

> 
> 
> Suggestions on how to get these tools would be much appreciated.
> 
> Request:
> >
> > To get network stack/OS kernel introspection tools (new custom ones or
> > just shell commands) for nailing down the error source on the next
> > occurrence!
> >
> > 1) For instance, an mbuf prettyprint-dumper and an mbuf resetter could be
> > of value.
> >
> > 2) Also a general TCP network stack state prettyprint-dumper and resetter
> > would be of value - the dumping as to track the source of the problem, and
> > the resetter as to fix the problem without need for reboot.
> >
> >
> 
> Any further pointers on how to track it further the next time (at the
> current rate, around the 5:th of September, would be much appreciated.
> 
> This is a frustrating bug indeed and I hope we can clarify and zap it as
> soon as possible, I don't know the OS specifics all too well but I'm happy
> to dig out the cause using any introspection tools available.

No such tools as you describe exist as far as I know. Nor, unless
you write them, are they likely to exist.

Reproducing the problem, preferably with an absolute minimum of
irrelevant activity, would be best. Then providing the captured information
requested in report.html, plus any captured text output you think
relevant could start a useful diagnostic process.

.... Ken

> 
> Thanks,
> Mikael
> 
> 2013/7/7 Kenneth R Westerback <[email protected]>
> 
> > On Sun, Jul 07, 2013 at 06:18:14PM +0300, Mikael wrote:
> > > Hi,
> > >
> > [[ snip ]]
> > >
> > > Reproducible: By me, beyond these two occurrences no.
> > >
> > > Machine: Dual-xeon with a BGE NIC
> > >
> > > Environment details:
> > >
> > >    - On the second occurrence there was one defunct process that had a
> > >    bound TCP port.
> > >    - Permanently ~~50 incoming TCP connections per second as some kind of
> > >    undirected spamming/flodding/micro-semi-DDOS, no clue from who or if
> > with
> > >    any particular flags.
> > >    - On both occurrences, previously an EGDB session run as user had
> > >    crashed so that kill -9 was needed.
> > >    - At some points, user processes had encountered malloc failure due to
> > >    insufficient RAM.
> > >    - Other than this absolutely nothing exotic.
> >
> > So you were under attack, something called EGDB had crashed but was
> > still running so you need to kill -9 it, and user processes had run
> > out of memory. And you can't reproduce.
> >
> > Obviously nothing exotic going on there!
> >
> > FrostyPants should have pointed you at
> >
> >         http://openbsd.org/report.html
> >
> > rather than "/me shoots himself".
> >
> > Attachments are not permitted on this list, and thus whatever was
> > in the png files did not make it through.
> >
> > .... Ken

Re: Any acceptance of TCP connections suddenly BREAKS without provocation globally: gives 'connection reset by peer', 'connection refused' or blocks. Only solved by reboot. Details&analysis provided. Request for pointers as to nail down the error source.

Reply via email to