Burton,
I finally been able to test ntop on 4.8-p3: it worked beautifully. All I can tell you is that I am against this patch. If this is a platform specific-bug and there's already a fix it must remain as such. In case of failure we should print a message saying "update your kernel". Adding platform-specific code for overcoming a bug already fixed is not what I have in mind.


Cheers, Luca

Burton M. Strauss III wrote:

It's a bug with stale file descriptions (reuse). Oh ghu....

From the ktrace output, it looked like the culprit is the constant opening
and closing of the log, which means that each time ntop opens the log file
it's reusing the file descriptor.

That shouldn't be a problem, but guess what, there's a bug in FreeBSD... See
this http://www.freebsd.org/cgi/query-pr.cgi?pr=51535:

"Description
In programs linked against libc_r:
- dup2'ing another file to one of the standard file descriptors
- doing his job with it and then closing it
- opening another file ( which will re-use the same fd )

will cause the latter to "inherit" the closed file's fcntl flags."

...

"There are rumors that this bug is present also in FreeBSD 4.8, NetBSD 1.6
and recent OpenBSD, but I have no possibility to verify it."


As long as it's always the same (log) file, this doesn't hurt. But when we open a socket, it grabs the next unused file descriptor, which is the dirty one previously used by the log. And it doesn't like it because sockets have different options than files.


According to the PR 51535 log, the fix was committed to 4.8 on Tue Jun 3 07:09:39 PDT 2003, which is after the release date of the 4.8-RELEASE-p1 that Stanley is using. 4.8-RELEASE-p3 was released this month (about a week after -p2 apparently), and it should have the fix.


The patch is ugly in concept and execution. Once I had it working for the http:// server, and looking at cloning all that code for the https:// stuff, I ended up doing some refactoring of initWeb() - moving crud out that wasn't web related, creating an initSocket() that does either normal or ssl, hiding all the complexity and doing it once, reworking stuff with some extra error checking, etc. That version is the one that's attached.


Since it won't hurt anything, I just control it through an #ifdef FREEBSD. That may not be enough, it may need to be #ifdef xxxBSD or even just use it in all systems - it can't hurt.

The alternative - detecting exactly which versions do and don't have the
problem - is even more ugly and much more complex.  We would have to figure
out the internal code numbers for all of the broken releases and enable the
code only for them, which becomes an ntop maintenance issue as new FreeBSD
versions are released.  Bad mojo...


Issues -


1. It's ugly.

   Get over it, Burton, it's an ugly bug.  And this version of the patch
isn't THAT bad.  The first version was a two bagger.

2. Darwin (the BSD legacy?) seems to need the same patch.

3. Once I got past this, in FreeBSD, I hit thread problems in both 4.8/5.1 -
which lock up hard.

   I'm apparently not the only one with FreeBSD threading problems - see
http://jeremy.zawodny.com/blog/archives/000203.html.



Luca:  Please look over the patch, and test it on Darwin.  Courtesy of a
nice user, I have (non-root) access to a Darwin 6.6  box, so I know it
compiles.  Based on the error messages I saw, I had to enable the patch via
adding #define FREEBSD in util.c.  But since this wasn't running as root
that may not apply to other systems...

Stanley:  Please test the patch - I have more faith in your FreeBSD systems
than I do in mine.  But - warning - it may lock up.   (The thing to do is to
connect gdb to the running ntop and do a info threads.  If they're all in
thread_kern_sched, and the web server doesn't respond, it's the problem I'm
seeing).

Andy: Please let me know if this fixes your issue.  It may well be that PR
R66TXWB and your problem, "sntop record loading problems (FreeBSD)" are the
same thing.  At least your 'absurd' solution was one of the hints I used.

Anyone else want to give it a try? Feedback welcomed!

I'm out of town for the next 3 days, so I'll look for answers on Sunday.  I
hope to get this wrapped up so we can release 2.3 early next week.  No, I'm
not bringing a laptop - my wife would kill me as it's our 20th anniversary.



-----Burton

US-based commercial support for ntop:
    http://www.ntopsupport.com
    mailto:[EMAIL PROTECTED]

Search the ntop mailing lists at gmane:
    http://search.gmane.org

HowTo Ask for Help at
http://snapshot.ntop.org/faq.php#83




--
Luca Deri <[EMAIL PROTECTED]>     http://luca.ntop.org/
Hacker: someone who loves to program and enjoys being
clever about it - Richard Stallman


_______________________________________________ Ntop-dev mailing list [EMAIL PROTECTED] http://listgateway.unipi.it/mailman/listinfo/ntop-dev

Reply via email to