RE: [Ntop] NTOP fails because of max sessions reached

NChoate Thu, 30 Mar 2006 15:52:17 -0800

I watched the session slowly grow and ramp up as time went on as if the
system was not purging sessions.  I noticed in the config that
--disable-instantsessionpurge is setting to YES by default.  From what I
read, isn't this supposed to be a NO?


==============================
ntop Configuration
 

Basic Information 
ntop Version 3.2 SourceForge .tgz 
Configured on Mar 20 2006 17:23:17 
Built on Mar 20 2006 17:24:18 
OS i686-pc-linux-gnu 
libpcap version libpcap version 0.9.4 
Running from ntop 
Libraries in /usr/lib 
ntop Process Id 6068 
http Process Id 6068 
Run State Run 
Command line 
Started as.... ntop -c -d --w3c -w 0 -W 443 -X 65535  
Resolved to.... ntop -c -d --w3c -w 0 -W 443 -X 65535 
Preferences used 
NOTE: (effective) means that this is the value after ntop has processed
the parameter.(default) means this is the default value, usually (but
not always) set by a #define in globals-defines.h. 
 
-a | --access-log-file (default) (nil) 
-b | --disable-decoders (default) No 
-c | --sticky-hosts Yes 
-d | --daemon Yes 
-e | --max-table-rows (default) 128 
-f | --traffic-dump-file (default) (nil) 
-g | --track-local-hosts (default) Track all hosts 
-o | --no-mac (default) Trust MAC Addresses 
-i | --interface (effective) eth0 
-j | --create-other-packets (default) Disabled 
-l | --pcap-log (default) (nil) 
-m | --local-subnets (effective) (default) (nil) 
-n | --numeric-ip-addresses (default) No 
-p | --protocols (default) internal list 
-q | --create-suspicious-packets (default) Disabled 
-r | --refresh-time (default) 120 
-s | --no-promiscuous (default) No 
-t | --trace-level (default) 3 
-u | --user nobody (uid=65534, gid=65534) 
-w | --http-server Inactive 
-z | --disable-sessions (default) No 
-B | --filter-expression (default) none 
-D | --domain jwoperating.com 
-F | --flow-spec (default) none 
-K | --enable-debug (default) No 
-L | --use-syslog daemon 
-M | --no-interface-merge (effective) (default) (Merging Interfaces) Yes

-N | --wwn-map (default) (nil) 
-O | --pcap-file-path (default) /var/lib/ntop 
-P | --db-file-path (default) /var/lib/ntop 
-Q | --spool-file-path (default) /var/lib/ntop 
-U | --mapper (default) (nil) 
-W | --https-server Active, all interfaces, port 443 
-X 65535 
--disable-schedYield Yes 
--disable-instantsessionpurge Yes 
--disable-mutexextrainfo Yes 
--disable-stopcap Yes 
--fc-only (default) No 
--instance (default) (nil) 
--no-fc (default) No 
--no-invalid-lun (default) No 
--p3p-cp (default) none 
--p3p-uri (default) none 
--pcap-nonblocking (default) No 
--skip-version-check Yes 
--ssl-watchdog (default) No 
--w3c Yes

Nathan

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
Burton Strauss
Sent: Wednesday, March 29, 2006 10:56 AM
To: [email protected]
Subject: RE: [Ntop] NTOP fails because of max sessions reached

Yeah ... They eventually timeout and get removed, but there's benefit to
the
viewer from having them stick around for some period of time so that
they
can be displayed...


There is a parameter:

       --disable-instantsessionpurge
        ntop  sets  completed  sessions  as  'timed  out' and then purge
them almost instantly, which is not the
        behavior you might expect from the discussions about purge
timeouts.
This switch makes ntop respect the
        timeouts for completed sessions.  It is NOT the default because
a
busy web server may have 100s or 1000s
        of completed sessions and this would significantly increase the
amount of memory ntop uses.

If you look in the code, there's even an ifdef for two purge behaviors -
but
nothing ever sets that 

  /* Immediately free the session */
  if(theSession->sessionState == FLAG_STATE_TIMEOUT) {
    if(myGlobals.device[actualDeviceId].tcpSession[idx] == theSession) {
      myGlobals.device[actualDeviceId].tcpSession[idx] =
theSession->next;
    } else
      prevSession->next = theSession->next;

#if DELAY_SESSION_PURGE
    theSession->sessionState = FLAG_STATE_END; /* Session freed by
scanTimedoutTCPSessions */
#else
    freeSession(theSession, actualDeviceId, 1, 1 /* lock purgeMutex */);
#endif
    releaseMutex(&myGlobals.tcpSessionsMutex);
    return(NULL);
  }

So FLAG_STATE_TIMEOUT -> immediate purge.

(You can add DELAY_SESSION_PURGE to globals-defines.h, but it just would
make things WORSE)

To understand session purge, you need to look at void
scanTimedoutTCPSessions(int actualDeviceId) {} - sessions.c around 450.

The key value there is lastSeen+CONST_DOUBLE_TWO_MSL_TIMEOUT which is,
from
globals-defines.h (around 1400)

/*
 * This is the 2MSL timeout as defined in the TCP standard (RFC 761).
 * Used in sessions.c and pbuf.c
 */
#define CONST_TWO_MSL_TIMEOUT          120      /* 2 minutes */
#define CONST_DOUBLE_TWO_MSL_TIMEOUT   (2*CONST_TWO_MSL_TIMEOUT)


That - 2m value - *2 or 4m - defines the length of time we PLAN to keep
a
purged session around.

So the key isn't the # of sessions open at once, it's the total # of
sessions open + those closed within the last 4m.  And actually it's more
because scanTimedoutTCPSessions() is only run - as part of
purgeIdleHosts()
from scanIdleLoop() - every 60s.  So it could be almost 5m of closed
sessions.

Plus there are limits on the number of sessions we purge per pass to
prevent
the scan from taking 'too long' (sessions.c around 473):

  purgeLimit = myGlobals.device[actualDeviceId].numTcpSessions/2;

Lots of short duration sessions means few active at any time, so
purgeLimit
is small, universe of sessions large.

There's no traceEvents() although you could add a couple of
informational
ones, e.g. change this:

    if(freeSessionCount > purgeLimit) break;

To

    if(freeSessionCount > purgeLimit) {
      traceEvent(CONST_TRACE_WARNING,
                 "Purge Limit (%d) reached, at session %d of %d -
additional
sessions not scanned",
                 purgeLimit, _idx, MAX_TOT_NUM_SESSIONS);
      break;
    }


But put it all together and it's not surprising that you reach
MAX_TOT_NUM_SESSIONS or the -X value.

I suppose the right answer would be to make purgeLimit more reflective
of
the actual data, but we've had issues with that in the past - ntop has
to
run well on both large, fast processors and small, underpowered ones
(people
love to redeploy the worst piece 'o junk in the dusty closet and expect
it
to monitor their GigE network).


-----Burton

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Wednesday, March 29, 2006 10:16 AM
To: [email protected]
Subject: RE: [Ntop] NTOP fails because of max sessions reached

I checked the stats, it appears that max sessions is being reached (just
like the error says), however, that leads me to more questions.  Stored
hosts is 1,880.

Perhaps this is more a function of how this is designed.  When Ntop
tracks a
session, isn't that eventually torn down or timed-out, or does that
reflect
a log of all sessions seen?  

We added a system recently that does a large amount of traffic, but its
not
a simultaneous sessions issue, more like many sessions opened and closed
on
regular basis.  If sessions are not closed to NTOP, I can see where that
will grow over time and eventually kill me, but I have had the older
version
run for weeks without trouble.

Firewall shows current sessions of 1785, I doubt that the average is
generally in the ballpark.  Never will is show 65000 sessions.


STATS
Host/Session counts - Device 0 (eth0)
Hash Bucket Size 1.9 KB
Actual Host Hash Size 32768
Stored hosts 1880
Host Bucket List Length [min 1][max 12][avg 1.1] Max host lookup 11
Session
Bucket Size 264 Session Actual Hash Size 65535 Sessions 65,535 Max Num.
Sessions 65,535 Session Bucket List Length [min 1][max 10][avg 1.9

Nathan

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
Gary
Gatten
Sent: Wednesday, March 29, 2006 9:35 AM
To: Nathan Choate; [email protected]
Subject: Re: [Ntop] NTOP fails because of max sessions reached

If you go to: About -> Show Config, it displays a bunch of info - config
and
stats.  Maybe that will help you.

I doubled the -x and -X defaults when I had this problem and it worked
fine.
How many hosts are you trying to monitor?

Gary


>>> [EMAIL PROTECTED] 3/28/2006 6:42:52 PM >>>
Yes, I had read those already.  To add more, I had recently upgraded
from a
previous version, I thought it was a previous version to 3.x, but I find
3.0
and 3.1 ebuild files along with the 3.2 that is currently being run.
Previous to running 3.2 I never had to mess with max sessions and had
sticky
hosts running.  Ntop would run for weeks without failing on the previous
versions.  Now, it fails after a couple of days.
Same hardware, updated kernel 2.6.15, Gentoo fully updated.

 

Is there any way to see the current number of sessions ntop is using
before
it stops?

 



_______________________________________________
Ntop mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop


_______________________________________________
Ntop mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop

_______________________________________________
Ntop mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop


_______________________________________________
Ntop mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop

RE: [Ntop] NTOP fails because of max sessions reached

Reply via email to