See my in-line notes to the trimmed msg...

But be aware that a lot of my response at first is to correct your
misunderstanding of how ntop works.  You've linked things together that are
asynchronous and assumed it's cause & effect.  What you're forgotten is the
multi-threaded nature of ntop and what you don't realize is the huge amount
of processing that has to occur to get ntop ready to process packets and be
a web server.

IMHO, the real issue is, what is this 'bpf' state, and why does trussing
clear it...

-----Burton

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf
> Of Stanley Hopcroft
> Sent: Friday, January 30, 2004 1:21 AM
<snip />

> In relation to Mr Strauss debugging suggestions in pre MyDoom mail
> flow times :-

> This may be simply a CHKVER related temporary hang while ntop attempts
> to log that it is running to the ntop dev team.

CHKVER is asynchronous with the web server startup.  It's in initNtop() in
globals-core.c,

#ifdef CFG_MULTITHREADED
  {
    pthread_t myThreadId;
    createThread(&myThreadId, checkVersion, NULL);
  }
#else
  checkVersion(NULL);
#endif

> No strange as it seems, ntop hangs in a state top reports as bpf and
> when the ntop process is trussed, data starts moving through the
> connections (or the kernel hands a connected socket to accept()).

This is probably the problem, but it's FreeBSD internals - what is 'bpf'
state?  And how do you know that's what ntop is 'in'.  We'll pick back up
with this later in this msg, after we dispose of the cause and effect
issues.

> However, trying to disable version checking with --no-check-version
> leads to ... what me be a bug in the 25 Jan 2004 CVS.

Bug is in the usage() listing, not the code.  --no-check-version in the
usage() should be --skip-version-check. man ntop is right.

<snip />

> 1 the web server hang is repeatable on _some_ instances of ntop on same
> hw, os (25 Jan CVS, FreeBSD 4.9-RELEASE-p1, tiny p5 class hosts). There
> exist ntops that do not seem to do this on same hw and os.
>
> 2 web server hangs at start; ntop process stuck in bpf state
>
> 3 truss of the ntop process unwedges the web server ...
>
> Looks like its time to start taking drugs again.

Yeah, really...

1) Are you saying that you can start ntop on host a and it always hangs,
while seemingly identical host b always works?  Or that sometimes host a
hangs and sometimes it works?

3) When it 'unwedges', has ntop been successfully recording packet data?
I.e. this is limited to the web server thread?  Or is the whole shooting
match hung, meaning FreeBSD is doing something to the ntop thread group???

> Your comments or hilarity are welcome.

<snip />

> It may be that I can only connect after the CHKVER error in the log
<snip />
> Yep

Nope ... I think you're confusing an accident of timing w/ causality.  On
your box the two processes just happen to take about the same time.

> tsade# telnet tsade 3000
> Trying 192.168.105.230...
> Connected to tsade.aipo.gov.au.
> Escape character is '^]'.
> GET /
>
>
> ..wait .. wait

The web server is started thus:

  traceEvent(CONST_TRACE_INFO, "WEB: Starting web server");
  createThread(&myGlobals.handleWebConnectionsThreadId,
handleWebConnections, NULL);
  traceEvent(CONST_TRACE_INFO, "THREADMGMT: Started thread (%ld) for web
server",
             myGlobals.handleWebConnectionsThreadId);

called at the end of initWeb() which is called from main().  A lot has
happened before this... the web server isn't what you thing of as 'active'
until it gets to handleWebConnections() in webInterface.c.

The last message before this is usually ...

            traceEvent(CONST_TRACE_INFO, "Note: SIGPIPE handler set
(ignore)");

Now, techically, once listen() is called, the requests are accepted and
queued.  That's this:

Jan 29 13:45:21 tigger ntop[7443]:   Initialized socket, port 3000, address
(any) [MSGID0349927]

Once that's happened, you'll see your 'hang'...  Since you are getting
connected, the ntop host's tcp/ip stack has accepted the connection (meaning
there's somebody bound - the bind() call -  and listening - the listen()
call - to the port), but the select() call which actually waits for the
connection and the recv() call which actually takes in the data have yet to
happen.

With all that is happening as part of ntop's startup, there can be a lag,
esp. as it reads large oui, asn, p2c files. I agree that adding a message,
"ntop's web server is now active" or some such would be a good idea. I can
also change the log tags to INITWEB: until the web server is actually up...
but all these changes do is clarify in the log what's happening.

> Jan 30 17:45:20 tsade ntop[2381]:   **ERROR** CHKVER: Unable to connect
> socket: Operation timed out(60)

Irrelevant, per my top comment, it's asynchronous.  UNLESS, this is a
FreeBSD artifact, which limits the # of calls or ports or some such... but I
find that hard to believe, as everything else that implements a web server
would have problems...

The real issue is what's 'bpf' state and why are things hanging...

So, let's go to the web...

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/hackers/2003-09/0120.html

Now that's interesting ... ntop doesn't use the 'bpf' device, but libpcap
might...  Checking man 4 bpf...

BIOCIMMEDIATE  (u_int) Enable or disable "immediate mode", based on the
truth value of the argument. When immediate mode is enabled, reads return
immediately upon packet reception. Otherwise, a read will block until either
the kernel buffer becomes full or a timeout occurs. This is useful for
programs like rarpd(8) which must respond to messages in real time. The
default for a new file is off.


So, we're to interesting questions.  What is 'bpf', how do you know ntop's
hung in that state, why does truss free it ... and what's the underlying
ntop problem???

-----Burton

_______________________________________________
Ntop-dev mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

Reply via email to