RE: [Ntop-dev] Performance Question

Burton M. Strauss III Fri, 05 Sep 2003 10:36:36 -0700

Well, that's bloody useless Ollie, isn't it? If you can't tell the threads
apart...


Most OSes gave up on lightweight threads and implement them as lighter
processes.  I guess for systems that really have threads, you could use
pthread_self(), but that's quite the nuisance to translate back to a process
number for 'fake' threads systems like Linux.  (Those are the confusing #s
in the log messages:

Sep  5 10:37:01 tigger ntop[9014]:   THREADMGMT: web connections thread
(9014) started... [MSGID0548089]

^^^^
Sep  5 10:37:01 tigger ntop[9007]:   THREADMGMT: Started thread (65541) for
web server [MSGID8791429]
                                                                 ^^^^^

The 1st message shows the thread gets it's PID#, but the second shows that
the parent gets the pthread_self() number of the child.  Somewhere in the
2.4 kernel series this was changed so it's random (unpredictable) because of
a security breach.  Makes it ugly to track back.


Your hits sound right, but it's still mildly expensive because of the state
change mutex stuff.  If you really wanted to improve performance, then you'd
drop everything related to the mutex.  That's the stateChangeMutex and this
structure:

typedef struct pthreadMutex {
  pthread_mutex_t mutex;
  char   isLocked, isInitialized;
  char   lockFile[64];
  int    lockLine;
  pid_t  lockPid;
  char   unlockFile[64];
  int    unlockLine;
  pid_t  unlockPid;
  u_int  numLocks, numReleases;

  time_t lockTime;
  char   maxLockedDurationUnlockFile[64];
  int    maxLockedDurationUnlockLine;
  int    maxLockedDuration;

  char   where[64];
  char   lockAttemptFile[64];
  int    lockAttemptLine;
  pid_t  lockAttemptPid;
} PthreadMutex;

except for mutex itself.  You'll probably need to keep the isLocked and
isInitialized and live with the tiny race possible setting them without
using the mutex.

But, since I want the ability to figure out deadlocks based on this data,
I'm not going to make any code changes.


-----Burton

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf
Of Chris Turbeville
Sent: Friday, September 05, 2003 10:37 AM
To: [EMAIL PROTECTED]
Subject: Re: [Ntop-dev] Performance Question


>
> Get some real hardware you whining puppy, not some $40 discard at 1st
> Saturday :-)
>
> You should also be able to find this yourself in the code.  grep for
> pthread_mutex_lock() and you'll see a #define to something called
> _accessMutex() - grep for that an the code in all it's glory is in
> util.c  (he he he)
>
Oh I have:)  I think we're discussing different thread systems.  See
below.
>
> Now that we're past that...
>
>
> ntop is multi-threaded.  The fork() is the least of it - used only for
>'read-only' web pages.  For normal running, there are six or seven threads
> at any time that are running - web server, one per interface, one decoder,
> one or more address resolution, main thread, idle purge, usually some sort
> of thread manager, etc. ...
No I understand this but in Solaris fork is the only thing that
will change your pid.  The 17-18 threads ntop runs all have the same pid
thats why I was at a loss why we check it every packet:)  As I said
above I suspect some of our OSen are assigning pids to threads so thats
why yall are using it.
>
> The biggest difference between 2.2 and 2.3 is that we've moved ntop from a
> multi-threaded, unprotected application to a multi-threaded protected one.
> So the mutexes are in place to protect data across the threads.  It's
> amazing how SMP hardware changes your view of the world - it really is
> possible for the purge to be running and purging hosts at the same time
the
> decoder is trying to record packet information about them!
>
Yes I can recreate this easily by dropping the mutex on my multiproc
sparc.  Stuff quickly goes awry.
> Intelectually, you always new it COULD happen, but in years and years of
> running ntop, I don't think Luca ever saw the problems.
>
> The issue is that thread programming is subtle and hard.  We actually saw
> that case where code of the form:
>
>  if (ptr == NULL) ptr=malloc();
>   <...2 or 3 irrelevant lines, no function calls, just open code...>
>  ptr->x++;
>
> was bombing...
>
> And, it's almost impossible to diagnose after the fact (especially when
the
> debugger, gdb, changes the thread model and so you don't see the
problems).
>
> To combat all that, we added information to the POSIX pthread_mutex_t to
> track which thread did the lock/release and where in ntop's code this
comes
> from.
>
> This is very useful data in diagnosing and fixing deadlocks - which we've
> introduced more than once during the 2.3 development cycle.
>
> I *think* things are stable now, but you never know - and so I've got zip
> interest in removing that data.  For example, if you run with -K and the
> decoder locks up, the info.html page will show the mutex information
> (as will the PR form, IIRC) - that's crucial for diagnosis.
>
> If you are hardware challenged, and don't mind flying without the FAA
> mandated safety equipment, you could certainly improve performance a bit,
> if you disabled the pid tracking part of this add-on data.  All you lose
> is the ability to figure out which interface handler locked up queueing
> the packet.
Each handler is a thread in Solaris so its the same pid:)
>
> Look in util.c for the _accessMutex(), _releaseMutex() and _tryLockMutex()
> functions.
>
> It would also be possible to put this into thread-specific data (to save
> the getpid() call).  That change falls out as part of the ntop watchdog
> I've been working on for post 2.3.
Ok.
>
> Off hand, I'm not sure where the time() call you're refering to is. ntop
> tries to use the pcap packet timestamp if possible.  In the back of my
mind,
> I think there's some code that checks if the timestamp is increasing and
> uses the time(NULL) call if it is not.  That's especially important as
> Luca's tryLock change makes it much more likely for ntop to handle packets
> out of order.
It is here:
   mutexId->lockTime = time(NULL);
in _accessMutex and in two others.  It looks like mutexId->lockTime is
just a stat being tracked.  By changing the:
  myPid=getpid();
in _accessMutex to:
  myPid=myGlobals.basentoppid;
And makeing the mutexId->lockTime stuff constant I went from 250000
syscalls in a vmstat 5 to 50000.  The app still goes unresponsive during
purge
I think but it now is usable.  I don't suggest this to anyone else just
wondering how hard a hammer I am using on this watch:)
-Chris

--
[EMAIL PROTECTED]           Chris Turbeville
NTT/VERIO
       Send mail with subject "send PGP Key" for PGP 6.5.2 Public key
_______________________________________________
Ntop-dev mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

_______________________________________________
Ntop-dev mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

RE: [Ntop-dev] Performance Question

Reply via email to