Hi, httpd developers:
I am using a
tool(http://code.google.com/p/data-race-test/wiki/ThreadSanitizer) to
study the potential race condition bugs in httpd.
I use httpd-2.2.14, and configure with --mpm=worker. For my first
round of testing, it actually reports lots of warning. I am checking
one by one manually. I have analyzed one of the bugs carefully, and
do not know if we should classify it as a benign race or not.
waring description:
1. I enable the USE_ATOMICS_GENERIC where use the hash lock to protect
the apr_atomic_casptr
2. The bug report is:
==5876== WARNING: Possible data race during write of size 8 at 0x429C578: {{{
==5876== T2 (locks held: {L1}):
==5876== #0 apr_atomic_casptr
/home/tianwei/apache/httpd-2.2.14/srclib/apr/atomic/unix/mutex.c:184
==5876== #1 ap_queue_info_set_idle
/home/tianwei/apache/httpd-2.2.14/server/mpm/worker/fdqueue.c:104
==5876== #2 worker_thread
/home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:844
==5876== #3 dummy_worker
/home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142
==5876== #4 ThreadSanitizerStartThread
/home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525
==5876== Concurrent read(s) happened at (OR AFTER) these points:
==5876== T22 (locks held: {}):
==5876== #0 ap_queue_info_wait_for_idler
/home/tianwei/apache/httpd-2.2.14/server/mpm/worker/fdqueue.c:209
==5876== #1 listener_thread
/home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:642
==5876== #2 dummy_worker
/home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142
==5876== #3 ThreadSanitizerStartThread
/home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525
3. warning analysis.
listener thread:
apr_status_t ap_queue_info_wait_for_idler(fd_queue_info_t
*queue_info,
apr_pool_t **recycled_pool)
{
/* Block if the count of idle workers is zero */
if (queue_info->idlers == 0) {
rv = apr_thread_mutex_lock(queue_info->idlers_mutex);
while (queue_info->idlers == 0) {
rv = apr_thread_cond_wait(queue_info->wait_for_idler,
queue_info->idlers_mutex);
}
rv = apr_thread_mutex_unlock(queue_info->idlers_mutex);
}
apr_atomic_dec32(&(queue_info->idlers));
for (;;) {
struct recycled_pool *first_pool = queue_info->recycled_pools;
....
}
}
worker thread:
apr_status_t ap_queue_info_set_idle(fd_queue_info_t *queue_info,
apr_pool_t *pool_to_recycle)
{
for (;;) {
//new_recycle->next = queue_info->recycled_pools;
struct recycled_pool *next = queue_info->recycled_pools;
new_recycle->next = next;
if (apr_atomic_casptr((volatile
void**)&(queue_info->recycled_pools),
new_recycle, next) ==
next) {
break;
}
}
/* If this thread just made the idle worker count nonzero,
* wake up the listener. */
if (prev_idlers == 0) {
rv = apr_thread_mutex_lock(queue_info->idlers_mutex);
if (rv != APR_SUCCESS) {
return rv;
}
rv = apr_thread_cond_signal(queue_info->wait_for_idler);
if (rv != APR_SUCCESS) {
apr_thread_mutex_unlock(queue_info->idlers_mutex);
return rv;
}
rv = apr_thread_mutex_unlock(queue_info->idlers_mutex);
if (rv != APR_SUCCESS) {
return rv;
}
}
Problem:
if in listener thread, the queue_info->idlers is "4", where it
will skip the "queue_info->idlers == 0" condition in
ap_queue_info_wait_for_idler, and decrease the idlers atomically, then
it will read "queue_info->recycled_pools". Then in worker thread, it
will write the "queue_info->recycled_pools" in apr_atomic_casptr with
a lock.
We can see that since the listener thread no longer execute the "lock,
cond_wait, unlock", even the worker thread will have a pthread_signal,
there is
no happen-before relation between the read of
"queue_info->recycled_pools" and the write of
"queue_info->recycled_pools". According to the definition, there is a
race condition here.
Question:
According to analysis above, it seems that there is a race,
however, I am guessing the developer do not treat it as a problem, or
there is other unknown mechanism for me to protect this code? Can you
give me some suggestions?
Thanks.
Tianwei
--
Sheng, Tianwei
Inst. of High Performance Computing
Dept. of Computer Sci. & Tech.
Tsinghua Univ.