Welcome to MY WORLD

I've spent more time on this stupid thing than any other problem and every
time I touch it I just get MORE confused.

Let's see... saying that 51535 is fixed is about equivalent to saying
Windows XP fixes some security holes in Windows 98.  It does, but not in the
way a normal person means.

PR 51535 is fixed in SOME branches of the 5.x series, but not the common
ones and not 4.x...  (well, maybe it is now in SOME 4.xes, but which ones??)
I've actually sent a msg to the GNATS system asking this precise question.
(oh and the bug DOES exist in other xBSD derived OSes and ones which have
patches pulled from xBSD - grep in configureextra for HAVE_FILEDESCRIPTORBUG
for the ones we KNOW about).

Anyway ugly as it is, that patch forces our select() on to a clean fd and so
we're not affected across the entire sweep.  Regardless of whether 51535 is
'fixed' in the specific version.

You are right that it's not strictly hung, if you're using a browser that
doesn't properly implement timeouts, eventually you do get that 1st page
back.  And if you're luck, everything works there after.

'bpf' state isn't documented anywhere I can find (and I can't find the top
source that actually matches what's on my FreeBSD machine, the one's I can
find don't have it).

Of course, the three (four? six?) different threading libraries across the
FreeBSD spectrum (don't forget linuxthreads and some experimental libraries)
complicate things - I think Stanley and I are lucky in that we're both just
using the default.  Which may have changed in the 5.x series and have been
one of the contributing problems  (IIRC, the change in the default wasn't in
5.0, it was in 5.1???).  Of course, along the way, Stanley has upgraded from
4.6.? to 4.9 and that too could alter things.

Note that we had a LOT of problems with the LinuxThreads -> NPTL change
which 1st appeared in RH9, and now is in Fedora and the 2.6 kernels.  It
could well be that this cleanup exacerbated the problems with FreeBSD's c_r.

It's really interesting that KSE works - does that mean it's a better, more
POSIX threads library (closer to NPTL), or just that it's broken in the same
ways as LinuxThreads, which behaviors ntop implicitly seems to 'like'????


If you want to pursue this, I would suggest making a cut down 'ntop' and see
if it still fails.  I think it's key that you have the core threads (I
listed a couple of times what was running), so that means a minimal pcap and
packet queue - in dequeuePacket, cut out all the processing and just sleep a
random # of ns.  Web server thread can just return a single static page.
But I'm not sanguine this will fail - it may well be the interactions of all
of the mutexes that's causing the mutex handler to 'sleep'...  But if we got
lucky and have the sleeping baby problem with a few 100 lines of code,
somebody MIGHT be willing to look.


-----Burton




> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Behalf Of Meloun Michal
> Sent: Wednesday, February 04, 2004 8:58 AM
> To: [EMAIL PROTECTED]
> Subject: [Ntop-dev] RE: [PATCH] FreeBSD hangs in daemon mode
>
>
> Situation is more and more complicated now.
> First of all, I agree, the patch is too much "aggressive" for upcoming
> 3.0 release. Forget it for now, please.
>
> Today, I make some other test - here is small recapitulation:
>
> FreeBSD 5.2 has 2 different threading libraries now
>  - KSE - kernel mode threads, its used when program is linked with -lkse
>  - and old user mode threads, used when program is linked with -lc_r
>
> The ntop works without any problem in daemon mode, when it's
> linked with KSE library,
> but hangs when it's linked with c_r library. (Note, the hangs is
> not a right word - it's only major,
> near infinite, slowdown in web interface). The ps show ntop in
> bpf state - I mean, its
> not problem, because ps on FreeBSD shows state of first thread.
>
> So, my original idea, broken fork(), is totally false.
> Simply, im out of ideas now?
>
> But, I still want to find real cause of this problem, and report it
>  (to FreeBSD, to tcpdump workes, or here :)
>
>
> Btw, the PR 51535 is repaired and closed
>
> Michal Meloun
>
>
> In article <[EMAIL PROTECTED]>,
> [EMAIL PROTECTED] says...
> > I'm going to strongly suggest we reject this change at the
> current stage.
> > We have a lot of testing under our belts on the current
> arrangement and it
> > works fine in all environments EXCEPT FreeBSD, which is known
> to be flakey
> > re threads anyway.
> >
> > If you will maintain this as a FreeBSD - only patch, we can
> revisit it after
> > 3.0...    Frankly I'd be more comfortable moving daemonize
> earlier, before
> > the pcap_open_live - one of the post-3.0 things I'm looking at
> is a threads
> > watchdog, which would be cleaner that way...
> >
> > Meanwhile, you should probably append this info the existing
> bug report on
> > FreeBSD (http://www.freebsd.org/cgi/query-pr.cgi?pr=56339).
> And also to the
> > tcpdump workers list.  However, I'll bet you ultimately get the
> 'fork() and
> > threads don't co-exist' answer.  Also, look at PR 51535 which
> is the other
> > open problem in this area.
> >
> > Thanks!
> >
> > And cool findings.
> >
> > -----Burton
> >
> >
> >
> >
>
> _______________________________________________
> Ntop-dev mailing list
> [EMAIL PROTECTED]
> http://listgateway.unipi.it/mailman/listinfo/ntop-dev
>

_______________________________________________
Ntop-dev mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

Reply via email to