Re: [Ntop-dev] Performance Question

Chris Turbeville Fri, 05 Sep 2003 08:38:30 -0700

> 
> Get some real hardware you whining puppy, not some $40 discard at 1st
> Saturday :-)
> 
> You should also be able to find this yourself in the code.  grep for
> pthread_mutex_lock() and you'll see a #define to something called
> _accessMutex() - grep for that an the code in all it's glory is in
> util.c  (he he he)
> 
Oh I have:)  I think we're discussing different thread systems.  See
below.
> 
> Now that we're past that...
> 
> 
> ntop is multi-threaded.  The fork() is the least of it - used only for
>'read-only' web pages.  For normal running, there are six or seven threads
> at any time that are running - web server, one per interface, one decoder,
> one or more address resolution, main thread, idle purge, usually some sort
> of thread manager, etc. ...
No I understand this but in Solaris fork is the only thing that
will change your pid.  The 17-18 threads ntop runs all have the same pid
thats why I was at a loss why we check it every packet:)  As I said
above I suspect some of our OSen are assigning pids to threads so thats
why yall are using it.
> 
> The biggest difference between 2.2 and 2.3 is that we've moved ntop from a
> multi-threaded, unprotected application to a multi-threaded protected one.
> So the mutexes are in place to protect data across the threads.  It's
> amazing how SMP hardware changes your view of the world - it really is
> possible for the purge to be running and purging hosts at the same time the
> decoder is trying to record packet information about them!
> 
Yes I can recreate this easily by dropping the mutex on my multiproc
sparc.  Stuff quickly goes awry.
> Intelectually, you always new it COULD happen, but in years and years of
> running ntop, I don't think Luca ever saw the problems.
> 
> The issue is that thread programming is subtle and hard.  We actually saw
> that case where code of the form:
> 
>  if (ptr == NULL) ptr=malloc();
>   <...2 or 3 irrelevant lines, no function calls, just open code...>
>  ptr->x++;
> 
> was bombing...
> 
> And, it's almost impossible to diagnose after the fact (especially when the
> debugger, gdb, changes the thread model and so you don't see the problems).
> 
> To combat all that, we added information to the POSIX pthread_mutex_t to
> track which thread did the lock/release and where in ntop's code this comes
> from.  
> 
> This is very useful data in diagnosing and fixing deadlocks - which we've
> introduced more than once during the 2.3 development cycle.
> 
> I *think* things are stable now, but you never know - and so I've got zip
> interest in removing that data.  For example, if you run with -K and the
> decoder locks up, the info.html page will show the mutex information
> (as will the PR form, IIRC) - that's crucial for diagnosis.
> 
> If you are hardware challenged, and don't mind flying without the FAA
> mandated safety equipment, you could certainly improve performance a bit,
> if you disabled the pid tracking part of this add-on data.  All you lose
> is the ability to figure out which interface handler locked up queueing
> the packet.
Each handler is a thread in Solaris so its the same pid:)
> 
> Look in util.c for the _accessMutex(), _releaseMutex() and _tryLockMutex()
> functions.
> 
> It would also be possible to put this into thread-specific data (to save
> the getpid() call).  That change falls out as part of the ntop watchdog
> I've been working on for post 2.3.
Ok.
> 
> Off hand, I'm not sure where the time() call you're refering to is. ntop
> tries to use the pcap packet timestamp if possible.  In the back of my mind,
> I think there's some code that checks if the timestamp is increasing and
> uses the time(NULL) call if it is not.  That's especially important as
> Luca's tryLock change makes it much more likely for ntop to handle packets
> out of order.
It is here:
   mutexId->lockTime = time(NULL);
in _accessMutex and in two others.  It looks like mutexId->lockTime is
just a stat being tracked.  By changing the:
  myPid=getpid();
in _accessMutex to:
  myPid=myGlobals.basentoppid;
And makeing the mutexId->lockTime stuff constant I went from 250000
syscalls in a vmstat 5 to 50000.  The app still goes unresponsive during purge
I think but it now is usable.  I don't suggest this to anyone else just
wondering how hard a hammer I am using on this watch:)
-Chris


-- 
[EMAIL PROTECTED]           Chris Turbeville                       NTT/VERIO
       Send mail with subject "send PGP Key" for PGP 6.5.2 Public key
_______________________________________________
Ntop-dev mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

Re: [Ntop-dev] Performance Question

Reply via email to