Hi, httpd developers: I am using a tool(http://code.google.com/p/data-race-test/wiki/ThreadSanitizer) to study the potential race condition bugs in httpd. I use httpd-2.2.14, and configure with --mpm=worker. For my first round of testing, it actually reports lots of warning. I am checking one by one manually. I have analyzed one of the bugs carefully, and do not know if we should classify it as a benign race or not. waring description: 1. I enable the USE_ATOMICS_GENERIC where use the hash lock to protect the apr_atomic_casptr 2. The bug report is: ==5876== WARNING: Possible data race during write of size 8 at 0x429C578: {{{ ==5876== T2 (locks held: {L1}): ==5876== #0 apr_atomic_casptr /home/tianwei/apache/httpd-2.2.14/srclib/apr/atomic/unix/mutex.c:184 ==5876== #1 ap_queue_info_set_idle /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/fdqueue.c:104 ==5876== #2 worker_thread /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:844 ==5876== #3 dummy_worker /home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142 ==5876== #4 ThreadSanitizerStartThread /home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525 ==5876== Concurrent read(s) happened at (OR AFTER) these points: ==5876== T22 (locks held: {}): ==5876== #0 ap_queue_info_wait_for_idler /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/fdqueue.c:209 ==5876== #1 listener_thread /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:642 ==5876== #2 dummy_worker /home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142 ==5876== #3 ThreadSanitizerStartThread /home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525
3. warning analysis. listener thread: apr_status_t ap_queue_info_wait_for_idler(fd_queue_info_t *queue_info, apr_pool_t **recycled_pool) { /* Block if the count of idle workers is zero */ if (queue_info->idlers == 0) { rv = apr_thread_mutex_lock(queue_info->idlers_mutex); while (queue_info->idlers == 0) { rv = apr_thread_cond_wait(queue_info->wait_for_idler, queue_info->idlers_mutex); } rv = apr_thread_mutex_unlock(queue_info->idlers_mutex); } apr_atomic_dec32(&(queue_info->idlers)); for (;;) { struct recycled_pool *first_pool = queue_info->recycled_pools; .... } } worker thread: apr_status_t ap_queue_info_set_idle(fd_queue_info_t *queue_info, apr_pool_t *pool_to_recycle) { for (;;) { //new_recycle->next = queue_info->recycled_pools; struct recycled_pool *next = queue_info->recycled_pools; new_recycle->next = next; if (apr_atomic_casptr((volatile void**)&(queue_info->recycled_pools), new_recycle, next) == next) { break; } } /* If this thread just made the idle worker count nonzero, * wake up the listener. */ if (prev_idlers == 0) { rv = apr_thread_mutex_lock(queue_info->idlers_mutex); if (rv != APR_SUCCESS) { return rv; } rv = apr_thread_cond_signal(queue_info->wait_for_idler); if (rv != APR_SUCCESS) { apr_thread_mutex_unlock(queue_info->idlers_mutex); return rv; } rv = apr_thread_mutex_unlock(queue_info->idlers_mutex); if (rv != APR_SUCCESS) { return rv; } } Problem: if in listener thread, the queue_info->idlers is "4", where it will skip the "queue_info->idlers == 0" condition in ap_queue_info_wait_for_idler, and decrease the idlers atomically, then it will read "queue_info->recycled_pools". Then in worker thread, it will write the "queue_info->recycled_pools" in apr_atomic_casptr with a lock. We can see that since the listener thread no longer execute the "lock, cond_wait, unlock", even the worker thread will have a pthread_signal, there is no happen-before relation between the read of "queue_info->recycled_pools" and the write of "queue_info->recycled_pools". According to the definition, there is a race condition here. Question: According to analysis above, it seems that there is a race, however, I am guessing the developer do not treat it as a problem, or there is other unknown mechanism for me to protect this code? Can you give me some suggestions? Thanks. Tianwei -- Sheng, Tianwei Inst. of High Performance Computing Dept. of Computer Sci. & Tech. Tsinghua Univ.