Hi all,
First, please, excuse my English - may English skill is vary bad, I know.

I probably found real explanation for FreeBSD locking problem - the problem
is common for all systems with user-mode pthread library - and unfortunately,
it's not a bug - all works as declared (plus/minus). Also, I must say, my
previous trace to pcap_open_live() is not in direct relation with root of this
problem. More that, I can (probably) also explain high packet drops and high
CPU usage, with are seen on *BSD platforms. Both issues are only manifestation
of same "base" problem.

The main cause is user-mode pthread library ant its interaction with bpf
interface.

Some theory first:
User-mode pthread library implement thread in user space of process, without
any special support from kernel. The kernel has no knowledge about user
threads.  All thread operation (context switching, thread scheduling etc)
are made in user mode library. This implementation has some major issues,
but only one is for us.
If any of user threads blocks in syscall, then whole application (all threads)
blocks too. This is very important issue. To avoid this, the pthreads library
contains wrappers over all blocking syscalls, and implements some mechanism
which converts (potential) blocking syscalls into nonblocking variants.  But,
this mechanism is only workaround in most cases. The only "ready for use"
blocking syscalls are nanosleep() and select(), nothing more. This two
syscalls are implemented properly (doesn't block in kernel and doesn't eat CPU
actively) The all other blocking syscals are converted into non-blocking
variants, witch waits in active cycle for completion - so thread consumes 100%
of CPU time), or passed directly to kernel (if nonblocking variant not exist)
- so syscall block all other threads until return from kernel.

Allow me explain it on standard file operations.

The code:

fd = open(..);
read(fd, ?)

executed in user-mode phtread library context  works little differently:

fd = open(?) is called with O_NONBLOCK flag  added, so all operation on this
fd are nonblocking ( flag is added in uthread_fd.c, using
__sys_fcntl(fd, F_SETFL, entry->flags | O_NONBLOCK);).

And the read(fd,..) is converted into active wait loop (code is striped) :

 while ((ret = __sys_read(fd, buf, nbytes)) < 0) {
   if (errno != EWOULDBLOCK)
    break;
}

So the reading thread actively waits for read completion.

This is very important mainly with interaction with /dev/bpfxxx device driver
(pcap library uses this device driver). The behavior of this drive is special
(other that regular files, pipe or sockets) in many cases. One difference
(important) is: the O_NONBLOCK in open is ignored, and nonblocking mode must
be set using ioctl(). So if read() is executed on bpf device then simply
block all threads within process until read syscall returns back into
user space.

Unfortunately,  ntop uses pcap_dispatch() in main packet reading thread.
And pcap_dispatch() is implemented using this code (pcap-bpf.c):
        if (p->cc == 0) {
                cc = read(p->fd, (char *)p->buffer, p->bufsize);

so then this call blocks all other ntop threads (including web server in select),
until bpf device return filled buffer back to ntop. And this can be very
long time, on lightly loaded network. And worse that,  ntop code calls
sched_yield() at many places -> so then main packed thread is regulary
scheduled - and block occurs again and again.

The hang is hard (read never returns) if y have no traffic on network
(im no sure about signals delivery here). But, because ntop uses pcap_open_live()
with timeout, then  pcap_dispatch() returns at regular intervals, allowing
slowly processing of other threads.

Unfortunately, this have next one issue (probably FreeBSD specific).
The pcap_open_live() function uses BIOCSRTIMEOUT ioctl to pass timeout
value down to bpf driver.
But man page for bpf  have this sentence:
   BIOCGRTIMEOUT  (struct timeval)
                    Set or get the read timeout parameter.
                    The argument specifies the length of time to wait before
                    timing out on a read request.  This parameter is initial-
                    ized to zero by open(2), indicating no timeout.


Note -> "This parameter is initialized to zero by open(2), indicating no timeout".

And because fork() uses dup2() for file descriptor cloning, and dup2()
on FreeBSD uses open(), then fork() also clears timeout value. This explain why
is order of fork() and  pcap_open_live() important  - in one case, the pcap_dispatch()
blocks in kernel until get data without timeout, in second case
pcap_dispatch() has 100ms timeout - so other threads can runs.

But, the bug is here in all cases - using of blocking read() in user-mode pthread
library is simply prohibited.

Proposed solution:

All changes are in pcapDispatch()
 - Use ioctl(myGlobals.device[i].pcapPtr ->fd, BIOCSRTIMEOUT, ...) for restore
   timeout value.
 - Always set nonblocking mode for pcap

 - And (and mainly) use select() before pcap_dispatch()

I'm ready to answer to any addition question, or if anything needs be more
detailed, simply anything.

Michal Meloun

_______________________________________________
Ntop-dev mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

Reply via email to