besides, the following warning indicated that developer already have
been aware of the problem and tolerate it:
1. comments in worker.c:
a:
896 requests_this_child--; /* FIXME: should be synchronized - aaron */
b:
632 /* TODO: requests_this_child should be synchronized - aaron */
633 if (requests_this_child <= 0) {
634 check_infinite_requests();
635 }
2. reported by tsan:
==5876== WARNING: Possible data race during read of size 4 at 0x6FB768: {{{
==5876== T22 (locks held: {}):
==5876== #0 listener_thread
/home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:633
==5876== #1 dummy_worker
/home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142
==5876== #2 ThreadSanitizerStartThread
/home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525
==5876== Concurrent write(s) happened at (OR AFTER) these points:
==5876== T15 (locks held: {}):
==5876== #0 worker_thread
/home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:895
==5876== #1 dummy_worker
/home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142
==5876== #2 ThreadSanitizerStartThread
/home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525
==5876== Address 0x6FB768 is 0 bytes inside data symbol "requests_this_child"
==5876== }}}
Tianwei
On Thu, Feb 4, 2010 at 3:05 PM, Kostya Serebryany <[email protected]> wrote:
> Looking at the races reported by ThreadSanitizer I can see comments in the
> httpd code which look like this:
> /* There is an intentional race condition in this design:
> * in a multithreaded app, one thread might be reading
> * from this cache_element to resolve a timestamp from
> * TIME_CACHE_SIZE seconds ago at the same time that
> * another thread is copying the exploded form of the
> * current time into the same cache_element. ...
> So, I afraid there are quite a few intentional races in this code.
> It'll be hard to find harmful races among those, unless someone takes time
> to annotate the benign races.
> --kcc
>
> On Thu, Feb 4, 2010 at 9:42 AM, Tianwei <[email protected]> wrote:
>>
>> Hi, httpd developers:
>> I am using a
>> tool(http://code.google.com/p/data-race-test/wiki/ThreadSanitizer) to
>> study the potential race condition bugs in httpd.
>> I use httpd-2.2.14, and configure with --mpm=worker. For my first
>> round of testing, it actually reports lots of warning. I am checking
>> one by one manually. I have analyzed one of the bugs carefully, and
>> do not know if we should classify it as a benign race or not.
>> waring description:
>> 1. I enable the USE_ATOMICS_GENERIC where use the hash lock to protect
>> the apr_atomic_casptr
>> 2. The bug report is:
>> ==5876== WARNING: Possible data race during write of size 8 at 0x429C578:
>> {{{
>> ==5876== T2 (locks held: {L1}):
>> ==5876== #0 apr_atomic_casptr
>> /home/tianwei/apache/httpd-2.2.14/srclib/apr/atomic/unix/mutex.c:184
>> ==5876== #1 ap_queue_info_set_idle
>> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/fdqueue.c:104
>> ==5876== #2 worker_thread
>> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:844
>> ==5876== #3 dummy_worker
>> /home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142
>> ==5876== #4 ThreadSanitizerStartThread
>> /home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525
>> ==5876== Concurrent read(s) happened at (OR AFTER) these points:
>> ==5876== T22 (locks held: {}):
>> ==5876== #0 ap_queue_info_wait_for_idler
>> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/fdqueue.c:209
>> ==5876== #1 listener_thread
>> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:642
>> ==5876== #2 dummy_worker
>> /home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142
>> ==5876== #3 ThreadSanitizerStartThread
>> /home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525
>>
>> 3. warning analysis.
>> listener thread:
>> apr_status_t ap_queue_info_wait_for_idler(fd_queue_info_t
>> *queue_info,
>> apr_pool_t **recycled_pool)
>> {
>>
>> /* Block if the count of idle workers is zero */
>> if (queue_info->idlers == 0) {
>> rv = apr_thread_mutex_lock(queue_info->idlers_mutex);
>> while (queue_info->idlers == 0) {
>> rv = apr_thread_cond_wait(queue_info->wait_for_idler,
>> queue_info->idlers_mutex);
>> }
>> rv = apr_thread_mutex_unlock(queue_info->idlers_mutex);
>> }
>>
>> apr_atomic_dec32(&(queue_info->idlers));
>>
>> for (;;) {
>> struct recycled_pool *first_pool = queue_info->recycled_pools;
>> ....
>> }
>> }
>>
>> worker thread:
>> apr_status_t ap_queue_info_set_idle(fd_queue_info_t *queue_info,
>> apr_pool_t *pool_to_recycle)
>> {
>> for (;;) {
>> //new_recycle->next = queue_info->recycled_pools;
>> struct recycled_pool *next = queue_info->recycled_pools;
>> new_recycle->next = next;
>> if (apr_atomic_casptr((volatile
>> void**)&(queue_info->recycled_pools),
>> new_recycle, next) ==
>> next) {
>> break;
>> }
>> }
>> /* If this thread just made the idle worker count nonzero,
>> * wake up the listener. */
>> if (prev_idlers == 0) {
>> rv = apr_thread_mutex_lock(queue_info->idlers_mutex);
>> if (rv != APR_SUCCESS) {
>> return rv;
>> }
>> rv = apr_thread_cond_signal(queue_info->wait_for_idler);
>> if (rv != APR_SUCCESS) {
>> apr_thread_mutex_unlock(queue_info->idlers_mutex);
>> return rv;
>> }
>> rv = apr_thread_mutex_unlock(queue_info->idlers_mutex);
>> if (rv != APR_SUCCESS) {
>> return rv;
>> }
>> }
>>
>>
>> Problem:
>> if in listener thread, the queue_info->idlers is "4", where it
>> will skip the "queue_info->idlers == 0" condition in
>> ap_queue_info_wait_for_idler, and decrease the idlers atomically, then
>> it will read "queue_info->recycled_pools". Then in worker thread, it
>> will write the "queue_info->recycled_pools" in apr_atomic_casptr with
>> a lock.
>> We can see that since the listener thread no longer execute the "lock,
>> cond_wait, unlock", even the worker thread will have a pthread_signal,
>> there is
>> no happen-before relation between the read of
>> "queue_info->recycled_pools" and the write of
>> "queue_info->recycled_pools". According to the definition, there is a
>> race condition here.
>>
>> Question:
>> According to analysis above, it seems that there is a race,
>> however, I am guessing the developer do not treat it as a problem, or
>> there is other unknown mechanism for me to protect this code? Can you
>> give me some suggestions?
>>
>> Thanks.
>>
>> Tianwei
>> --
>> Sheng, Tianwei
>> Inst. of High Performance Computing
>> Dept. of Computer Sci. & Tech.
>> Tsinghua Univ.
>
>
--
Sheng, Tianwei
Inst. of High Performance Computing
Dept. of Computer Sci. & Tech.
Tsinghua Univ.