I finally been able to test ntop on 4.8-p3: it worked beautifully. All I can tell you is that I am against this patch. If this is a platform specific-bug and there's already a fix it must remain as such. In case of failure we should print a message saying "update your kernel". Adding platform-specific code for overcoming a bug already fixed is not what I have in mind.
Cheers, Luca
Burton M. Strauss III wrote:
It's a bug with stale file descriptions (reuse). Oh ghu....
From the ktrace output, it looked like the culprit is the constant opening and closing of the log, which means that each time ntop opens the log file it's reusing the file descriptor.
That shouldn't be a problem, but guess what, there's a bug in FreeBSD... See this http://www.freebsd.org/cgi/query-pr.cgi?pr=51535:
"Description In programs linked against libc_r: - dup2'ing another file to one of the standard file descriptors - doing his job with it and then closing it - opening another file ( which will re-use the same fd )
will cause the latter to "inherit" the closed file's fcntl flags."
...
"There are rumors that this bug is present also in FreeBSD 4.8, NetBSD 1.6 and recent OpenBSD, but I have no possibility to verify it."
As long as it's always the same (log) file, this doesn't hurt. But when we open a socket, it grabs the next unused file descriptor, which is the dirty one previously used by the log. And it doesn't like it because sockets have different options than files.
According to the PR 51535 log, the fix was committed to 4.8 on Tue Jun 3 07:09:39 PDT 2003, which is after the release date of the 4.8-RELEASE-p1 that Stanley is using. 4.8-RELEASE-p3 was released this month (about a week after -p2 apparently), and it should have the fix.
The patch is ugly in concept and execution. Once I had it working for the http:// server, and looking at cloning all that code for the https:// stuff, I ended up doing some refactoring of initWeb() - moving crud out that wasn't web related, creating an initSocket() that does either normal or ssl, hiding all the complexity and doing it once, reworking stuff with some extra error checking, etc. That version is the one that's attached.
Since it won't hurt anything, I just control it through an #ifdef FREEBSD. That may not be enough, it may need to be #ifdef xxxBSD or even just use it in all systems - it can't hurt.
The alternative - detecting exactly which versions do and don't have the problem - is even more ugly and much more complex. We would have to figure out the internal code numbers for all of the broken releases and enable the code only for them, which becomes an ntop maintenance issue as new FreeBSD versions are released. Bad mojo...
Issues -
1. It's ugly.
Get over it, Burton, it's an ugly bug. And this version of the patch isn't THAT bad. The first version was a two bagger.
2. Darwin (the BSD legacy?) seems to need the same patch.
3. Once I got past this, in FreeBSD, I hit thread problems in both 4.8/5.1 - which lock up hard.
I'm apparently not the only one with FreeBSD threading problems - see http://jeremy.zawodny.com/blog/archives/000203.html.
Luca: Please look over the patch, and test it on Darwin. Courtesy of a nice user, I have (non-root) access to a Darwin 6.6 box, so I know it compiles. Based on the error messages I saw, I had to enable the patch via adding #define FREEBSD in util.c. But since this wasn't running as root that may not apply to other systems...
Stanley: Please test the patch - I have more faith in your FreeBSD systems than I do in mine. But - warning - it may lock up. (The thing to do is to connect gdb to the running ntop and do a info threads. If they're all in thread_kern_sched, and the web server doesn't respond, it's the problem I'm seeing).
Andy: Please let me know if this fixes your issue. It may well be that PR R66TXWB and your problem, "sntop record loading problems (FreeBSD)" are the same thing. At least your 'absurd' solution was one of the hints I used.
Anyone else want to give it a try? Feedback welcomed!
I'm out of town for the next 3 days, so I'll look for answers on Sunday. I hope to get this wrapped up so we can release 2.3 early next week. No, I'm not bringing a laptop - my wife would kill me as it's our 20th anniversary.
-----Burton
US-based commercial support for ntop: http://www.ntopsupport.com mailto:[EMAIL PROTECTED]
Search the ntop mailing lists at gmane: http://search.gmane.org
HowTo Ask for Help at
http://snapshot.ntop.org/faq.php#83
-- Luca Deri <[EMAIL PROTECTED]> http://luca.ntop.org/ Hacker: someone who loves to program and enjoys being clever about it - Richard Stallman
_______________________________________________ Ntop-dev mailing list [EMAIL PROTECTED] http://listgateway.unipi.it/mailman/listinfo/ntop-dev
