>
> Get some real hardware you whining puppy, not some $40 discard at 1st
> Saturday :-)
>
> You should also be able to find this yourself in the code. grep for
> pthread_mutex_lock() and you'll see a #define to something called
> _accessMutex() - grep for that an the code in all it's glory is in
> util.c (he he he)
>
Oh I have:) I think we're discussing different thread systems. See
below.
>
> Now that we're past that...
>
>
> ntop is multi-threaded. The fork() is the least of it - used only for
>'read-only' web pages. For normal running, there are six or seven threads
> at any time that are running - web server, one per interface, one decoder,
> one or more address resolution, main thread, idle purge, usually some sort
> of thread manager, etc. ...
No I understand this but in Solaris fork is the only thing that
will change your pid. The 17-18 threads ntop runs all have the same pid
thats why I was at a loss why we check it every packet:) As I said
above I suspect some of our OSen are assigning pids to threads so thats
why yall are using it.
>
> The biggest difference between 2.2 and 2.3 is that we've moved ntop from a
> multi-threaded, unprotected application to a multi-threaded protected one.
> So the mutexes are in place to protect data across the threads. It's
> amazing how SMP hardware changes your view of the world - it really is
> possible for the purge to be running and purging hosts at the same time the
> decoder is trying to record packet information about them!
>
Yes I can recreate this easily by dropping the mutex on my multiproc
sparc. Stuff quickly goes awry.
> Intelectually, you always new it COULD happen, but in years and years of
> running ntop, I don't think Luca ever saw the problems.
>
> The issue is that thread programming is subtle and hard. We actually saw
> that case where code of the form:
>
> if (ptr == NULL) ptr=malloc();
> <...2 or 3 irrelevant lines, no function calls, just open code...>
> ptr->x++;
>
> was bombing...
>
> And, it's almost impossible to diagnose after the fact (especially when the
> debugger, gdb, changes the thread model and so you don't see the problems).
>
> To combat all that, we added information to the POSIX pthread_mutex_t to
> track which thread did the lock/release and where in ntop's code this comes
> from.
>
> This is very useful data in diagnosing and fixing deadlocks - which we've
> introduced more than once during the 2.3 development cycle.
>
> I *think* things are stable now, but you never know - and so I've got zip
> interest in removing that data. For example, if you run with -K and the
> decoder locks up, the info.html page will show the mutex information
> (as will the PR form, IIRC) - that's crucial for diagnosis.
>
> If you are hardware challenged, and don't mind flying without the FAA
> mandated safety equipment, you could certainly improve performance a bit,
> if you disabled the pid tracking part of this add-on data. All you lose
> is the ability to figure out which interface handler locked up queueing
> the packet.
Each handler is a thread in Solaris so its the same pid:)
>
> Look in util.c for the _accessMutex(), _releaseMutex() and _tryLockMutex()
> functions.
>
> It would also be possible to put this into thread-specific data (to save
> the getpid() call). That change falls out as part of the ntop watchdog
> I've been working on for post 2.3.
Ok.
>
> Off hand, I'm not sure where the time() call you're refering to is. ntop
> tries to use the pcap packet timestamp if possible. In the back of my mind,
> I think there's some code that checks if the timestamp is increasing and
> uses the time(NULL) call if it is not. That's especially important as
> Luca's tryLock change makes it much more likely for ntop to handle packets
> out of order.
It is here:
mutexId->lockTime = time(NULL);
in _accessMutex and in two others. It looks like mutexId->lockTime is
just a stat being tracked. By changing the:
myPid=getpid();
in _accessMutex to:
myPid=myGlobals.basentoppid;
And makeing the mutexId->lockTime stuff constant I went from 250000
syscalls in a vmstat 5 to 50000. The app still goes unresponsive during purge
I think but it now is usable. I don't suggest this to anyone else just
wondering how hard a hammer I am using on this watch:)
-Chris
--
[EMAIL PROTECTED] Chris Turbeville NTT/VERIO
Send mail with subject "send PGP Key" for PGP 6.5.2 Public key
_______________________________________________
Ntop-dev mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev