On 06/29/2011 10:53 PM, Michael Scheidell wrote:
> 
> 
> On 6/29/11 3:29 PM, Török Edwin wrote:
>> (gdb) backtrace full
> (gdb) backtrace full
> #0  0x00000008018baf4a in __error () from /lib/libthr.so.3
> No symbol table info available.
> #1  0x00000008018bac3b in __error () from /lib/libthr.so.3
> No symbol table info available.
> #2  0x00000008018b66c5 in pthread_mutex_getyieldloops_np () from 
> /lib/libthr.so.3
> No symbol table info available.
> #3  0x0000000800790892 in cli_vm_execute_jit () from 
> /usr/local/lib/libclamav.so.7
> No symbol table info available.

Thanks, can you try if this patch helps (to be applied on top of unmodified 
0.97.1):
http://git.clamav.net/gitweb?p=clamav-devel.git;a=commitdiff;h=bb5572cbe192471bfe7285f77661fff808c8a821

Your stacktraces showed several problems:
 - there are 14 threads all in __error() (errno), from cli_vm_execute_jit, that 
is probably returning from pthread_cond_timedwait
 - clamd's threads are missing, this is why it no longer responsed to anything, 
not even PING
 - all those threads seem to be looping trying to reacquire a mutex, but the 
thread that owns the mutex probably died already

I couldn't get the bytecode_watchdog to crash on Linux/amd64, but valgrind 
showed me this warning:

==10908== Invalid read of size 8
==10908==    at 0x333740B58E: pthread_cond_timedwait@@GLIBC_2.3.2 
(pthread_cond_timedwait.S:147)
==10908==    by 0x4D55287: bytecode_watchdog(void*) (in 
/home/edwin/clam/git/builds/default/libclamav/.libs/libclamav.so.6.1.10)
==10908==    by 0x3337406B3F: start_thread (pthread_create.c:304)
==10908==    by 0x33368D52FC: clone (clone.S:112)
==10908==  Address 0xed00578 is not stack'd, malloc'd or (recently) free'd
==10908==
==10908== Syscall param futex(timeout) points to unaddressable byte(s)
==10908==    at 0x333740B63B: pthread_cond_timedwait@@GLIBC_2.3.2 
(pthread_cond_timedwait.S:216)
==10908==    by 0x4D55287: bytecode_watchdog(void*) (in 
/home/edwin/clam/git/builds/default/libclamav/.libs/libclamav.so.6.1.10)
==10908==    by 0x3337406B3F: start_thread (pthread_create.c:304)
==10908==    by 0x33368D52FC: clone (clone.S:112)
==10908==  Address 0xed00578 is not stack'd, malloc'd or (recently) free'd

So the bug is that pthread_cond_timedwait's timeout parameter became invalid.

The patch above does 2 things:
 - if pthread_cond_timedwait returns any error (other than ETIMEDOUT) it logs 
it, and breaks the loop
 - make sure pthread_cond_timedwait's timeout parameter is valid until 
pthread_cond_timedwait wakes up

Best regards,
--Edwin
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml

Reply via email to