This is a summary of problems with Ntop 2.0 that I am aware of, and have
done at least some preliminary investigation of.

Disclaimer:  This is all what I think I know and have seen from reading code
and running Ntop for a short while.  If I'm wrong, let me know.  Please!


For many of them, I've posted suggestions and not heard back.  Where there
is something hanging, I've listed "ACTION ITEMS"...

As always, please post FULL information about your setup if you ask about
bugs.

(Excerpted from a 07Jan2001 post to ntop and ntop-dev)

We don't mind helping people, but you do have to give the rest of the people
monitoring this mailing list some basic information.

I've given notice that I will not reply to ANY "NTop has a bug" messages
unless you provide some reasonable information about your configuration!

(If you don't like it, ask somebody else to support you... I don't think
it's unreasonable to ask it...  If you're uncomfortable giving specifics
(such as which mail server, then leave it generic))

Specifically:

  Hardware
     Type & # of processors (given in your msg)
     Amount of memory
     # network interfaces and types (vendor, bus, etc.)

  Software
     NTop version, source and any applied patches
     OS vendor & version
     Any major upgrades (kernel, networking, etc.)
     gcc version (e.g. gcc --version)
     glibc version
     What else is running

   Network
     Roughly where are the interface(s) you're monitoring (Public Internet,
Private LAN, what?)
     What's the bandwidth (e.g. 10 Mbps University internet, 1.5 Mbps T1,
CableModem capped at 1.5Mbps, 56K dialup)
     How many machines (traffic sources/destinations) and users


Also, remember that there is no formal support for NTop - it's people just
like you working on a best available basis.

     (Posting a message to "support" every two days asking for the status of
your bug fix is just going to get you kill-filed.  Offering to help solve
things will get you lots of support.)

     (On a personal basis, if I ask you to try something, either try it and
tell me what happened or tell me you won't/can't - otherwise you get dumped
into a list like this one for "someday")

On to the bugs:


1.  Limited Memory
==================

Running NTop on machines with very limited memory can cause problems.  The
basic symptom is that it simply shuts down without notice.  If you are
running NTop in foreground (from the command line), you will see a
segmentation fault.

Often - but not always - this happens after an Hash Extend.  Look in the log
for messages like this:

15/Jan/2002 13:37:51 Extending TCP hash [new size: 256]
15/Jan/2002 13:39:35 Extending hash: [old=32, new=48]

NOTE THAT HASH EXTENDS ARE COMPLETELY NORMAL.  They occur as NTop learns
about more hosts or sessions and needs storage to track them.  As NTop
learns about more hosts/sessions, memory usage grows.  The idle session and
host purge occur on 30 minute intervals, and the host address purge occurs
every 2 hours.

THEREFORE, AFTER ~ 2-4 hours of "normal" usage, NTop should reach some form
of steady state regarding memory usage.  At that point, you should rarely
see the hash tables expand or shrink, unless activity on the network
changes.  I have found that ~30MB (ulimit -v 30000) is sufficient to run a
small home LAN - say 3-4 machines inside which reference a few dozen
outside.  I seem to stabilize about 3.6MB of allocated memory + program etc
(call it 2% of 256MB).

However, if there is insufficient memory, the hash extend (along with other
memory sensitive pieces of code) may fail and these are not always tested
for.

Here are my notes about memory sizes and issues resulting from them:

Ntop - unhanded memory issues (based on ulimit -v #)

  5000      +  - libc.so loading error
? 6100- 6123   - FATAL ERROR: Unknown user ntop. (pw = getpwnam(optarg)
fails)
  6124- 6415+  - segmentation fault after "Initializing SSL..."
? 6900- 7055   - segmentation fault after "Welcome to nfsWatchPlugin. (C)
1999 by Luca Deri."
  7056      +  - caught error on 1st thread create
?25000-     +  - caught error on 1st packet listener
?27500-28000+  - caught error on 2nd packet listener
?28250-28500+  - Runs w/o some graphics (e.g. trafficStats.html) and
segfaults after icmpWatch
?29000         - Runs for a while then soft fails...
                     09/Jan/2002 11:34:46 Initializing plugins (if any)...
                     09/Jan/2002 11:34:46 Waiting for HTTP connections on
192.168.42.1 port 3000...
                     09/Jan/2002 11:34:46 Sniffying...
                     09/Jan/2002 11:34:46 Started thread (9226) for network
packet sniffing on eth1.
                     09/Jan/2002 11:34:46 Started thread (10251) for network
packet sniffing on eth2.
                     09/Jan/2002 11:35:57 Extending TCP hash [new size: 64]
                     09/Jan/2002 11:36:35 Extending hash: [old=32, new=48]
                     09/Jan/2002 11:37:11 Extending hash: [old=48, new=72]
                     09/Jan/2002 11:37:11 Extending TCP hash [new size: 128]
                     09/Jan/2002 11:37:30 Extending TCP hash [new size: 256]
                     09/Jan/2002 11:37:51 Extending hash: [old=72, new=108]
                     09/Jan/2002 11:38:14 *ERROR* Unable to allocate 4096
bytes (4 entries of 1024 bytes) in pbuf.c(256)
                     09/Jan/2002 11:38:14 Cleaning up (called from
leaks.c(474)...
                     09/Jan/2002 11:38:14 Waiting until threads terminate...
                     09/Jan/2002 11:38:14 Address resultion terminated...
                     09/Jan/2002 11:38:17 Freeing hash host instances... (1
device(s) to save)
                     09/Jan/2002 11:38:17 69 instances freed
                                ...

The ? means it may occur at some ulimit -v less, I just don't know the exact
#.  Similarly, the + means it may occur for slightly larger ulimit -v
values.  Thus,

? 6900- 7055   - segmentation fault after "Welcome to nfsWatchPlugin. (C)
1999 by Luca Deri."

Means that for ulimit -v 7055 it fails, for ulimit -v 7056 there is
something else that happens and I don't know the exact lower limit for this
issue - it happens above 6415...

The line with "*ERROR* Unable to allocate 4096 bytes (4 entries of 1024
bytes) in pbuf.c(256)" in it is a memory allocation patch I'm working on -
see below...

You may also see memory problems occurring as graphics aren't produced (Ntop
calls gdchart, which bombs and doesn't return the image file) - so you get
the little red X graphic.

ACTION ITEM (me): General malloc patch - I am working on this, albeit
slowly...



2. "Null Mutex" problem  ([EMAIL PROTECTED], [EMAIL PROTECTED] and
[EMAIL PROTECTED])
=======================

(Have not done much investigation on this - all I know is the symptoms):

Log messages like this:

> ERROR: accessMutex() call with a NULL mutex [address.c:358]
> ERROR: releaseMutex() call with a NULL mutex [address.c:370]

Yet the Mutex table in the configuration report looks OK.

ACTION ITEM: NEEDS TO BE INVESTIGATED...


3. Traffic Mis-Classification ([EMAIL PROTECTED])
=============================

Symptom:

   >However, some of my DNS queries went out from ports in the above range
   >(5900-5960), obviously to port 53 (udp); the replies came from port 53
to
   >the high port between 5900 and 5960.

   >ntop reported that I have had VNC traffic, which is incorrect. I merely
had
   >DNS traffic whose client port happened to be one of the VNC server
ports.

Proposed change is to make UDP handle the classification like TCP does:

--- 2.0-released/pbuf.c Thu Dec 27 11:12:29 2001
+++ 2.0-released/pbuf.c.new     Wed Jan 16 17:27:45 2002
@@ -3260,6 +3260,16 @@
        }

-       if(handleIP(dport, srcHostIdx, dstHostIdx, length, 0) == -1)
-         handleIP(sport, srcHostIdx, dstHostIdx, length, 0);
+        /* Handle UDP traffic like TCP, above -
+             That is: if we know about the lower# port, even if it's the
destination,
+                      classify the traffic that way.
+                (BMS 12-2001)
+         */
+       if(dport < sport) {
+                if(handleIP(dport, srcHostIdx, dstHostIdx, length, 0)
== -1)
+                    handleIP(sport, srcHostIdx, dstHostIdx, length,
+        } else {
+                if(handleIP(sport, srcHostIdx, dstHostIdx, length, 0)
== -1)
+                    handleIP(dport, srcHostIdx, dstHostIdx, length, 0);
+        }

        handleUDPSession(h, (off & 0x3fff)

I don't know what else this would "break" and would appreciate feedback.  It
seems to work for me!

ACTION: Somebody to test and report back!



4. Known Memory Leaks - None Critical
=====================================

A clean shutdown of NTop leaves a small number of blocks of memory unfreed.
Of course, the OS cleans these up, and none of them seem to relate to memory
leaks over time, so this is just an FYI:

  5.        52   0x80558d0                   ntop.c(374)
      protoIPTrafficInfos = (char**)realloc(protoIPTrafficInfos,
sizeof(char*)*(numIpProtosToMonitor+1));

  6.      1216   0x8055908                    ntop.c(398)
 17.      1376   0x8060920                    ntop.c(398)
      ipPortMapper = (PortMapper*)malloc(theSize);

 11.        36   0x805cf78                  plugin.c(253)
 13.        36   0x805d338                  plugin.c(253)
 15.        36   0x8061468                  plugin.c(253)
      newFlow = (FlowFilterList*)calloc(1, sizeof(FlowFilterList));

 12.        24   0x805cfa0                  plugin.c(259)
 14.        24   0x805d360                  plugin.c(259)
 16.        24   0x8061490                  plugin.c(259)
      newFlow->fcode = (struct bpf_program*)calloc(numDevices, sizeof(struct
bpf_program));

 20.      1544   0x805bf40              initialize.c(302)
      broadcastEntry = (HostTraffic*)malloc(sizeof(HostTraffic));

188.       192   0x81637d0                    pbuf.c(969)
      theSession = (IPSession*)malloc(sizeof(IPSession));


5.  -p list includes the default protocols ([EMAIL PROTECTED])
==========================================

(There is also a patch posted to the archives to allow the -p file to have
multiple lines - look for "[Ntop-dev] [2.0ENH] -p (protocols list) from
multiple line file")

When you use the -p switch, the default protocols list is still added in.
Proposed fix: move the code in main.c:

  if(protoSpecs != NULL) {
    if(protoSpecs[0] != '\0')
      handleProtocols(protoSpecs);
    free(protoSpecs);
  }

up before the call to postCommandLineArgumentsInitialization(&lastTime);

Or pull the line in postCommandLineArgumentsInitializtion (see
initialize.c):

  if(numIpProtosToMonitor == 0)
    addDefaultProtocols();

into main.c after the 1st block, above...

I haven't tested either of 'em, but both choices SHOULD work.  No
preferences on my part.  The other thing
postCommandLineArgumentsInitialization does is to daemonize the program.

ACTION ITEM: Somebody test the patch and report back!



6.  Makefile problems
=====================

If you already have the libraries (gdchart, etc.) installed, the makefiles
don't find them.  I posted a patch in December to adjust them to where I put
the files.  Look for it, or just follow the instructions and create a second
(static) copy of the few relevant libraries.

Also, may people get the message "libtool: link: CURRENT `-release' is not a
nonnegative integer".  It's due to some issue between automake, autoconf,
make, etc.  Also whines about $(EXTEXE). Fix is to manually insert the
release # in the appropriate lines of the Makefile, etc.

I use this (but it reflects the locations where I put the libraries):

--- ntop/configure.in~  Thu Dec 27 11:40:55 2001
+++ ntop/configure.in   Fri Dec 28 07:53:54 2001
@@ -876,12 +876,12 @@
   if test ".${GDCHART_ROOT}" != .; then
     if test -d $GDCHART_ROOT &&
-       test -r $GDCHART_ROOT/libgdchart.a &&
-       test -r $GDCHART_ROOT/gdc.h &&
-       test -r $GDCHART_ROOT/gd-1.8.3/libgd.a &&
-       test -r $GDCHART_ROOT/gd-1.8.3/gd.h &&
-       test -r $GDCHART_ROOT/zlib-1.1.3/libz.a; then
+       test -r $GDCHART_ROOT/lib/libgdchart.a &&
+       test -r $GDCHART_ROOT/include/gdc.h &&
+       test -r $GDCHART_ROOT/lib/libgd.a &&
+       test -r $GDCHART_ROOT/include/gd.h &&
+       test -r $GDCHART_ROOT/lib/libz.so; then
          GDCHART_ROOT=`cd ${GDCHART_ROOT} && pwd`
-








MORELIBS="${MORELIBS} -L$GDCHART_ROOT -lgdchart -L$GDCHART_ROOT/gd-1.8.3 -lg
d -L$GDCHART_ROOT/gd-1.8.3/libpng-1.0.8 -lpng -L$GDCHART_ROOT/zlib-1.1.3 -lz
"
-         INCS="${INCS} -I$GDCHART_ROOT"
+
        MORELIBS="${MORELIBS} -L$GDCHART_ROOT/lib -lgdchart -lgd -lpng -lz"
+         INCS="${INCS} -I$GDCHART_ROOT/include"
          AC_DEFINE(HAVE_GDCHART)
          AC_MSG_RESULT([found in $GDCHART_ROOT])
--- ntop/Makefile.am~   Thu Dec 27 10:00:25 2001
+++ ntop/Makefile.am    Fri Dec 28 07:50:36 2001
@@ -57,5 +57,6 @@
 SUBDIRS = . @PLUGINS@ @INTOP@

-DIST_COMMON = AUTHORS CONTENTS COPYING ChangeLog \
+DIST_COMMON =
+DIST_COMMON += AUTHORS CONTENTS COPYING ChangeLog \
               MANIFESTO NEWS PORTING  \
               SUPPORT_NTOP.txt THANKS \
@@ -131,5 +131,5 @@
 libntop_la_DEPENDENCIES = config.h
 libntop_la_LIBADD       = $(CORELIBS)
-libntop_la_LDFLAGS      = -version-info @NTOP_VERSION_INFO@ -release
@NTOP_RELEASE@ -export-dynamic @DYN_FLAGS@
+libntop_la_LDFLAGS      = -version-info 0 -release 2 -export-dynamic
@DYN_FLAGS@

 # Archive for http representation, or the 'viewer'
@@ -144,5 +145,5 @@
 libntopreport_la_DEPENDENCIES = config.h
 libntopreport_la_LIBADD       = $(MORELIBS)
-libntopreport_la_LDFLAGS      = -version-info @NTOP_VERSION_INFO@ -release
@NTOP_RELEASE@ -export-dynamic @DYN_FLAGS@
+libntopreport_la_LDFLAGS      = -version-info 0 -release 2 -export-dynamic
@DYN_FLAGS@

 man_MANS = ntop.8 intop/intop.1
@@ -150,5 +151,5 @@
 .PHONY: snapshot

-ntopd: ntop
+ntopd$(EXEEXT): ntop
        @ln -sf ntop ntopd
        @-(cd .libs && ln -sf ntop ntopd)
--- 2.0-released/intop/Makefile.am.orig Fri Dec 28 08:22:29 2001
+++ 2.0-released/intop/Makefile.am      Fri Dec 28 08:22:49 2001
@@ -25,5 +25,6 @@
 #

-DIST_COMMON = Makefile.am Makefile.in intop.1
+DIST_COMMON =
+DIST_COMMON += Makefile.am Makefile.in intop.1

 DISTCLEANFILES = logger.db ntop.db ntop_pw.db intops
--- 2.0-released/plugins/pep/Makefile.am.orig   Fri Dec 28 08:24:26 2001
+++ 2.0-released/plugins/pep/Makefile.am        Fri Dec 28 08:24:48 2001
@@ -24,5 +24,6 @@
 #

-DIST_COMMON = Makefile.am Makefile.in
+DIST_COMMON =
+DIST_COMMON += Makefile.am Makefile.in
 CLEANFILES  =
 EXTRA_DIST  = available.pl hosts.pl
@@ -54,6 +55,6 @@
 LIBS = # ${PEPLIBS}

-pep.so: pep.c
+pep.so($EXEEXT): pep.c
 #      ${CC} -shared ${libpep_la_OBJECTS}
${PEPLIBS} -Wl,-soname -Wl,libpep.so.0 -o pep.so
        @${CC} -shared ${libpep_la_OBJECTS} ${PEPLIBS} -o pep.so
-       (cd .. && ln -fs pep/pep.so .)
\ No newline at end of file
+       (cd .. && ln -fs pep/pep.so .)
--- 2.0-released/plugins/Makefile.am.orig       Fri Dec 28 08:52:57 2001
+++ 2.0-released/plugins/Makefile.am    Fri Dec 28 08:55:27 2001
@@ -27,5 +27,6 @@
 SUBDIRS = . #pep

-DIST_COMMON = Makefile.am Makefile.in
+DIST_COMMON =
+DIST_COMMON += Makefile.am Makefile.in
 CLEANFILES  =
 EXTRA_DIST  =
@@ -52,11 +53,11 @@

 libicmpPlugin_la_SOURCES = icmpPlugin.c
-libicmpPlugin_la_LDFLAGS = -shared -version-info @NTOP_VERSION_INFO@
@DYN_FLAGS@
+libicmpPlugin_la_LDFLAGS = -shared -version-info 0 @DYN_FLAGS@

 liblastSeenPlugin_la_SOURCES = lastSeenPlugin.c
-liblastSeenPlugin_la_LDFLAGS = -shared -version-info @NTOP_VERSION_INFO@
@DYN_FLAGS@
+liblastSeenPlugin_la_LDFLAGS = -shared -version-info 0 @DYN_FLAGS@

 libnfsPlugin_la_SOURCES = nfsPlugin.c
-libnfsPlugin_la_LDFLAGS = -shared -version-info @NTOP_VERSION_INFO@
@DYN_FLAGS@
+libnfsPlugin_la_LDFLAGS = -shared -version-info 0 @DYN_FLAGS@

 #librmonPlugin_la_SOURCES = rmonPlugin.c rmon.h
@@ -76,5 +77,5 @@
        cc -bundle -flat_namespace -undefined suppress -o
.libs/libicmpPlugin.so@SO_VERSION_PATCH@ icmpPlugin.o

-icmpPlugin.so: .libs/libicmpPlugin.so@SO_VERSION_PATCH@
+icmpPlugin.so$(EXEEXT): .libs/libicmpPlugin.so@SO_VERSION_PATCH@
        @ln -s .libs/libicmpPlugin.so icmpPlugin.so

@@ -82,5 +83,5 @@
        cc -bundle -flat_namespace -undefined suppress -o
.libs/liblastSeenPlugin.so@SO_VERSION_PATCH@ lastSeenPlugin.o

-lastSeenPlugin.so: .libs/liblastSeenPlugin.so@SO_VERSION_PATCH@
+lastSeenPlugin.so$(EXEEXT): .libs/liblastSeenPlugin.so@SO_VERSION_PATCH@
        @ln -s .libs/liblastSeenPlugin.so lastSeenPlugin.so

@@ -88,5 +89,5 @@
        cc -bundle -flat_namespace -undefined suppress -o
.libs/libnfsPlugin.so@SO_VERSION_PATCH@ nfsPlugin.o

-nfsPlugin.so: .libs/libnfsPlugin.so@SO_VERSION_PATCH@
+nfsPlugin.so$(EXEEXT): .libs/libnfsPlugin.so@SO_VERSION_PATCH@
        @ln -s .libs/libnfsPlugin.so nfsPlugin.so


ACTION ITEM:  Somebody who knows autoconf/automake, etc. really needs to fix
this correctly, once and for all...  Volunteers???


7.   -S doesn't work   ([EMAIL PROTECTED])
=====================

>My question is with the -S  2 option will the data be stored and reloaded
>after ntop crashes? For some reason when I use this option and ntop stops
>when I reload ntop it does not display the previous data even though the
>hostsInfo.db file has a current time and data stamp. Am I doing something
>wrong? How often does the -S option save the data and can the time frame be
>set?

I *think* that your problem is because the reload is in getHostInfo(), where
it's checking the hash for the host.  If I'm reading it right, that would
only be run if ANOTHER packet to/from that host was seen.  So instead of
getting all the data reloaded, you only get back information for hosts you
continue to see.  Looking at invokation of freeHostSessions, I *think*
that's semi deliberate.  The info is saved off when the information about
that host would otherwise be aged off.

*****What I don't see is an invoke during shutdown******

In initialize.c, it's done during resetStats:

void resetStats(void) {
...
  /* Do not reset the first entry (broadcastEntry) */
  for(i=0; i<interfacesToCreate; i++) {
    u_int j;
    for(j=1; j<device[i].actualHashSize; j++)
      if(device[i].hash_hostTraffic[j] != NULL) {
        freeHostInfo(i, j, 1);
        device[i].hash_hostTraffic[j] = NULL;
      }
...

pbuf.c invokes it when the hash table is full


If I'm right, you won't see any saved data unless you overflow the hash
table and it can't grow further.

ACTION ITEM:  Somebody could test my theory by resetting statistics just
before you shut down
and seeing if *that* preserves them across startups...

The fix?  Basically you would need to copy the code from resetStats in
initialize.c into cleanup in ntop.c (starts at 833):

/* Report statistics and write out the raw packet file */
RETSIGTYPE cleanup(int signo) {
...

  killThread(&handleWebConnectionsThreadId);

<<<INSERT LINES FROM initialize.c HERE, MAYBE>>>

#ifdef FULL_MEMORY_FREE
  cleanupAddressQueue();
...

ACTION ITEM:  Somebody to apply test the theory and the "fix" and report
back...


8. "Unbalanced" problem... Chris Picton et al...
==========================

Briefly, there are a couple of global variables relating to the device a
packet is seen from.  There are also multiple paths to some places like name
resolution.  It's possible to create the hash index based on the WRONG hash
table and then get sanity checks, etc.

Symptom:  various.  It CAN include a lot of things, but the giveaway seems
to be these in the log:

15/Jan/2002 17:05:08 WARNING: Index 46 out of range [0..32] @ pbuf.c:550

Also, two different sized hashes (one is 48 the other 32) (more activity on
one side of your network than the other?)... what I call an "unbalanced"
network.

The -M (merge) flag seems to make it worse.  Maybe...

Here is what I ****THINK**** is happening.  The session purge is invoked
every 60 seconds (interesting, given it's dying between 1 and 2 minutes into
the run...)  The Session purge loops through all the devices and follows a
tortured path through to getHostInfo, where it computes the index.  However,
the code in pbuf.c uses a global variable actualDeviceId to indicate which
device and which hash table to use.  The code in the purge loops using i and
does not set actualDeviceId.

Basically, this is a "wild" pointer and Ntop will (probably) crash at some
point, possibly much later...

This isn't new - there are a number of "FIXME" type comments about the
problem.  It's just HUGE.  Basically, the global variable needs to die and
be passed as a parameter around throughout dozens of routines.  Thus when it
gets to places like getHostInfo, it's using the RIGHT hash.  The -M flag
just makes it more complex.  The problem is that it's all through the code.
I've spent a day editing it, and just barely gotten a clean compile.

ACTION: I'm working on it.


==========================================================
==========================================================

-----Burton

_______________________________________________
Ntop-dev mailing list
[EMAIL PROTECTED]
http://listmanager.unipi.it/mailman/listinfo/ntop-dev

Reply via email to