RE: [Ntop] ntop - high volume environment

Burton Strauss Wed, 29 Jun 2005 05:03:40 -0700

See {BS3} inline and:

Nowhere in here do you tell us how many hosts you are trying to monitor.
It's not the 29 * 256 of your local nets, unless you've used -m and -g |
--track-local-hosts to tell ntop that.  Otherwise, you are trying to monitor
those 7.5K hosts + the 10, 12, 50 or 100 that each of them is in contact
with.  That can easily exhaust usable memory, even at the reduced per-host
memory usage for each HostTraffic entry in the current CVS version.  And
regardless of how much memory you can throw at it.


It's not just 'raw' memory, it's really how much ntop can grab w/o swapping,
something that turns out to be incredibly difficult to determine.  I've
found that - even w/ 852M on Tigger (and the only other thing running on
Tigger is my 'production' monitoring instance) - the real usable per-process
memory is around 140M.  After that, swapping starts and as I've discussed
before, swapping kills you.

With your hardware/software setup, you need to figure out what the effective
real limit on the number of hosts ntop can track is for your environment.

One way to do this is to use a script, using wget on
http://127.0.0.1:3000/textinfo.html to pull the statistics every 15 or 30s.
Look for 

Host/Session counts - Device 0 (eth1)

Hash Bucket Size.....1.9 KB
Actual Host Hash Size.....32768
Stored hosts.....227
Host Bucket List Length.....[min 1][max 3][avg 1.0]
Max host lookup.....2


Host/Session counts - Device 1 (eth2)

Hash Bucket Size.....1.9 KB
Actual Host Hash Size.....32768
Stored hosts.....7
Host Bucket List Length.....[min 1][max 1][avg 1.0]
Max host lookup.....0


What you are trying to determine is the point at which ntop starts to swap.
Then you can use the crude -X and/or -x switches to limit the number of
HostTraffic entries.

BUT: The best, long term answer is to look at the other switches, such as
track-local-hosts and configure ntop properly.

-----Burton


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Neal
Rauhauser
Sent: Tuesday, June 28, 2005 10:15 PM
To: [email protected]
Subject: [Ntop] ntop - high volume environment


  I've got FreeBSD 5.4 on P4/2.8 with a gig of ram and SuSe 9.2 on a quad
Xeon 500 also with a gig of ram. I'm running the ntop that comes with SuSe
9.2 ( ntop-3.0.053-3) and I pulled the latest ports for FreeBSD and built
ntop 3.1.

{BS3} Try the CVS version for both.  From experience, ports has issues -
read the back traffic for this list.  3.0.053 is a DEVELOPMENT version
before 3.1.  That's unsupported.

 I've got netflow version 5 exports coming from a pair of Cisco 3660s. 
One has a DS3 to Sprint, the other has a 100 mbit fiber connection to
another regional provider a few blocks from here. It is 0300 as I write this
and I'm seeing about 4 mbits on the DS3 and about 1.5 mbits on the fiber
link. The provider has 29 /24s worth of space and we rapidly exhaust the
static 4096 slot address resolution queue.

{BS3} You can't exhaust the queue - you just fill it up and it stops
accepting new entries.  As those get resolved, entries are available for new
hosts.  Assuming enough memory for the host traffic entries, and a static
population of hosts, eventually it will all get resolved.  Now if the queue
hits 4K and NEVER resolves anything, that's a different problem.  Either
way, however, it won't kill ntop - it will just leave hosts as numeric
addresses.  FWIW, those stats too are in textinfo.html:

----- Address Resolution -----
...
Queued - dequeueAddress()

Total Queued.....150
Not queued (duplicate).....0
Maximum Queued.....5
Current Queue.....0


Resolved - resolveAddress():

Addresses to resolve.....150
....less 'Error: No cache database'.....0
....less 'Found in ntop cache'.....1
Gives: # gethost (DNS lookup) calls.....149


DNS Lookup Calls:

DNS resolution attempts.....149
....Success: Resolved.....93
....Failed.....56
........HOST_NOT_FOUND.....48
........NO_DATA.....0
........NO_RECOVERY.....0
........TRY_AGAIN (don't store).....8
........Other error (don't store).....0
DNS lookups stored in cache.....141
Host addresses kept numeric.....56



 The SuSe box lasts for all of three minutes accepting flows before ntop
dies. The BSD box will run for quite a bit longer, but it too dies with an
unresponsive web server and a still running ntop. I've copied /var/db/ntop
into three separate directories under /var/db, created one instance of ntop
for the onboard NIC and each of the netflow exports. 
Running with the split configuration yields stable monitoring of the local
NIC and the fiber link but the DS3 link monitor still dies, but it takes
much longer - perhaps thirty minutes. It should be noted that even with the
-P <other directory> option all of the rrd stuff goes in /var/db/ntop/rrd -
I don't know enough about rrd to tell if this is OK, but it seems like a bug
to me.

 Am I chasing my tail by trying to get ntop to behave with this much traffic
volume, or are there some tuning things I'm missing here? I wouldn't want to
undertake debugging any C code but I'd be happy to work with a developer if
they wanted access to the system to see these things as they occur.





_______________________________________________
Ntop mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop

_______________________________________________
Ntop mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop

RE: [Ntop] ntop - high volume environment

Reply via email to