Title: The "Bad magic number" message is back... (expected=1968/real=10248)
From everything, it sounds like it is probably a stray pointer (one that isn't pointing at a valid HostTraffic entry - what you see is whatever happens to be in those bytes).
 
At this point, I don't have anything nice to offer.  You probably need to run under Valgrind to detect the bad pointer reference before it kills ntop (after it's there it's too late to figure out why).  Now the normal script (packages/RedHat/ntop.init) has a valgrind option, but
 
(1) It will slow ntop down substantially
(2) It runs in the foreground
(3) You probably will need to build up a spec file to teach Valgrind to ignore the glibc et al issues (lots of the standard C library code has memory reference problems). Here is what I'm using:
 
## ntop specific suppress commands for valgrind
##
 
# Skip over standard library errors
 
##GDBM:
{
   write_nocancel_in_gdbm_open
   Memcheck:Param
   write(buf)
   fun:__write_nocancel
   fun:gdbm_open
}
{
   write_nocancel_in_inlined_gdbm_open
   Memcheck:Param
   write(buf)
   fun:__write_nocancel
   fun:initSingleGdbm
}
{
   libc_in_gdbm_open
   Memcheck:Param
   write(buf)
   obj:/lib/tls/libc-2.3.3.so
   fun:gdbm_open
}
{
   write_nocancel_in_gdbm_store
   Memcheck:Param
   write(buf)
   fun:__write_nocancel
   fun:_gdbm_write_bucket
   fun:_gdbm_end_update
   fun:gdbm_store
}
 
##libpng:
{
   png_write_deflate
   Memcheck:Cond
   obj:/usr/lib/libz.so.1.2.1.2
   obj:/usr/lib/libz.so.1.2.1.2
   fun:deflate
   fun:png_write_filtered_row
}
 
##libdl:
{
   dl_relocate_object_in_dl_open
   Memcheck:Cond
   fun:_dl_relocate_object
   fun:dl_open_worker
   fun:_dl_catch_error
   fun:__GI__dl_open
}
 
##libpcap:
{
   sendto_in_pcap_findalldevs
   Memcheck:Param
   socketcall.sendto(msg)
   fun:sendto
   fun:getifaddrs
   fun:pcap_findalldevs
}
 
##glibc:
{
   localtime_tz_getenv2
   Memcheck:Addr2
   fun:getenv
   fun:tzset_internal
   fun:__tz_convert
   fun:localtime
}
{
   localtime_tz_getenv4
   Memcheck:Addr4
   fun:getenv
   fun:tzset_internal
   fun:__tz_convert
   fun:localtime
}
{
   strftime_tz_getenv2
   Memcheck:Addr2
   fun:getenv
   fun:tzset_internal
   fun:tzset
   fun:strftime_l
   fun:strftime
}
{
   strftime_tz_getenv4
   Memcheck:Addr4
   fun:getenv
   fun:tzset_internal
   fun:tzset
   fun:strftime_l
   fun:strftime
}
{
   sem_wait_futex
   Memcheck:Param
   futex(uaddr2)
   fun:sem_wait@@GLIBC_2.1
}
 
##zlib:
{
   gzflush_deflate
   Memcheck:Cond
   obj:/usr/lib/libz.so.1.2.1.2
   obj:/usr/lib/libz.so.1.2.1.2
   fun:deflate
   obj:/usr/lib/libz.so.1.2.1.2
   fun:gzflush
}
 
# Cleanup crud...
 
(4) Use valgrind 2.4.x - I have not tried ntop under valgrind 3.0 (it was released this month).
 
When you start running under valgrind, you'll see this:
 
$ sudo /etc/init.d/ntopdevel valgrind
Stopping...
Stopping ntopdevel: Running under valgrind, parameters are
Starting ntopdevel:
/usr/bin/valgrind --verbose --suppressions=/devel/ntop/valgrind.supp --logfile=/tmp/valgrind.log --leak-check=yes --leak-resolution=high --track-fds=yes --trace-children=yes --gen-suppressions=yes --sanity-level=0 /devel/bin/ntop -i eth1,eth2 @/etc/ntopdevel.conf
   Processing file /etc/ntopdevel.conf for parameters...
Thu Aug 25 09:00:05 2005  NOTE: Interface merge enabled by default
Thu Aug 25 09:00:07 2005  Initializing gdbm databases
Thu Aug 25 09:00:08 2005  **WARNING** Unknown preference: ntop.filterExpressionInExtraFrame, value = 1
Thu Aug 25 09:00:08 2005  NOTE: Interface merge disabled from prefs file
Thu Aug 25 09:00:08 2005  ntop v.3.1.051 traceEventTemp
Thu Aug 25 09:00:08 2005  Configured on Aug 18 2005 17:26:42, built on Aug 18 2005 17:34:24.
Thu Aug 25 09:00:08 2005  Copyright 1998-2005 by Luca Deri <
[EMAIL PROTECTED]>
Thu Aug 25 09:00:08 2005  Get the freshest ntop from
http://www.ntop.org/
Thu Aug 25 09:00:08 2005  NOTE: ntop is running from '/devel/bin'
Thu Aug 25 09:00:08 2005  NOTE: (but see warning on man page for the --instance parameter)
Thu Aug 25 09:00:08 2005  NOTE: ntop libraries are in '/devel/lib'
Thu Aug 25 09:00:08 2005  Initializing ntop
Thu Aug 25 09:00:08 2005  Checking eth1 for additional devices
Thu Aug 25 09:00:08 2005  Resetting traffic statistics for device eth1
Thu Aug 25 09:00:08 2005  DLT: Device 0 [eth1] is 1, mtu 1514, header 14
...
 
You also need another host session to monitor the log file, which is in /tmp from my script:
 
$ ls -lt /tmp
...
-rw-------  1 root root      6775 Aug 25 09:00 valgrind.log.pid7937
...
 
I usually use watch:
 
# watch -n1 tail -n25 /tmp/valgrind.log.pid7937
 
If valgrind finds a problem, it will show you the lines causing the error and (with the parameters in the script) stop waiting for you to tell it to generate the suppression message:
 
==7937== ---- Print suppression ? --- [Return/N/n/Y/y/C/c] ----
 
But remember, you have to press the letter key back on the 1st session (the running ntop).   Since we're less interested in specific problems, C (continue w/o stopping again) is probably the best choice.  (Or you could edit the script and comment out the line which adds --gen-suppressions=yes).
 
This will be a huge file if ntop runs a long time (every thread destroyed will cause a search for memory leaks, etc.) (and every page you pull will spin threads).  But it's the only way I know to find the problem (of course, since Valgrind changes the malloc routines, it's also possible that the problem will NEVER surface under valgrind.  That's what makes these problems so much fun).
 
-----Burton

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Mazuhelli
Sent: Thursday, August 25, 2005 7:37 AM
To: [email protected]
Subject: [Ntop-dev] The "Bad magic number" message is back... (expected=1968/real=10248)

Hi Burton,

On 08/19/05 10:03, you wrote :

Secondly, there is now an 'unmagic' number, 1290, which is set in HostTraffic during the start of the delete process.  Because of the timing of the messages (between selection and completion of the purge), I'm suspicious that we are reusing deleted entries and/or have a pointer to them during the delete process.  So if your messages become:

Aug 17 06:46:34 gbp ntop[23566]:  [MSGID8850278] **WARNING** Error: bad magic number [expected=1968/real=1290][deviceId=0] getFirstHost()[address.c/54]

That will tell us the problem is reusing deleted entries.  If it's still zero, then the problem is something else.

Guess what: after a few (probably unrelated) crashes, the “bad magic number” message came back:

Aug 24 22:53:07 gbp ntop[1808]:  [MSGID8931710] RRD: Cycle 153 ended, 59 RRDs updated
Aug 24 22:53:20 gbp ntop[1808]:  [MSGID0825709] IDLE_PURGE: Device 0 [eth1] FINISHED selection, 1747 [out of 10213] hosts selected
Aug 24 22:53:21 gbp ntop[1808]:  [MSGID8477291] IDLE_PURGE: Device 0 [eth1]: 1747/10212 hosts deleted, elapsed time is 0.051907 seconds (0.000030 per host)
Aug 24 22:53:28 gbp ntop[1808]:  [MSGID8757584] SFP: Ending fingerprint scan cycle 304 - checked 131, resolved 0
Aug 24 22:55:21 gbp ntop[1808]:  [MSGID0825709] IDLE_PURGE: Device 0 [eth1] FINISHED selection, 1409 [out of 9895] hosts selected
Aug 24 22:55:21 gbp ntop[1808]:  [MSGID8477291] IDLE_PURGE: Device 0 [eth1]: 1409/9894 hosts deleted, elapsed time is 0.051273 seconds (0.000036 per host)
Aug 24 22:55:59 gbp ntop[1808]:  [MSGID8757584] SFP: Ending fingerprint scan cycle 305 - checked 126, resolved 0
Aug 24 22:56:53 gbp ntop[1808]:  [MSGID8781395] **ERROR** Bad magic number (expected=1968/real=10248) [deviceId=0] lookupHost()[pbuf.c/1088]

I grep’ed for the unmagic number (1920) and found it in globals-defines.h but didn’t find any reference to “10248”. Where does that number come from?

This time all ntop processes have died so there is no process left to run gdb on.

Regards,

Marc.
--
Marc Mazuhelli
Computer security analyst
Service des technologies de l'information
Université de Sherbrooke

_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

Reply via email to