Hi Anton, Nice to see your work on race/UB removal.
I have been working on removing races from libavcodec/pthread_frame.c as part of a wider effort to remove TSan-detected races from Firefox, which uses ffmpeg-3.0.2. This is tracked starting at https://bugzilla.mozilla.org/show_bug.cgi?id=1274256#c34. Your fixes mostly look good and necessary to me. However, for patches 10 and 11 of your series, I had a different approach, and I am wondering if you would consider using it instead. The proposed patch is at the end of this message. [PATCH 10/13] pthread_frame: use atomics for signalling threads to die I believe that changing fctx->die to an atomic_int makes TSan quiet but does not really remove the race, which still exists at a higher level of granularity. The store to fctx->die should be atomic, yes, but I rewrote the reading side of it like this: pthread_mutex_lock(&p->mutex); while (1) { - while (p->state == STATE_INPUT_READY && !fctx->die) - pthread_cond_wait(&p->input_cond, &p->mutex); + while (p->state == STATE_INPUT_READY) { + pthread_cond_wait(&p->input_cond, &p->mutex); + if (fctx->die) break; + } if (fctx->die) break; This is for ffmpeg 3.0.2, slightly different from trunk, but the idea is the same: move the read of fctx->die so that it is guarded by p->mutex. [PATCH 11/13] pthread_frame: use atomics for PerThreadContext.state There are two instances of double-checked locking on PerThreadContext::state, one in submit_packet, one in ff_thread_decode_frame. Per analysis at https://bugzilla.mozilla.org/show_bug.cgi?id=1274256#c40 I believe the double-checked locking saves 3 global memory operations per decoded frame and so is pointless as an optimisation. I don't think it is desirable to make ::state atomic, for the same reason as mentioned above for fctx->die. Instead I prefer to fix races using the pthread_* functions wherever possible. To assess the effects of my proposed fixes to [10] and [11], I tested ffmpeg decoding 3000 frames of a vp9 file at frame size 240x100. This is chosen so as to make the frames relatively small and hence to maximise the relative overhead of the per-frame synchronisation. I used 100 threads -- again, so as to simulate a worst case, and ran tests 100 times, on a 2.7 GHz Intel Haswell machine, which was quiet (99.3% idle CPU cycles). This is with ffmpeg-3.0.2 compiled with gcc-5.3.1 at -O2. The command was: ./ffmpeg-3.0.2/ffmpeg -threads 100 -i testfile.webm -f null - Results, averaged over the 100 runs, are: Baseline 3.0.2: 0.95587 user seconds 3.0.2 with proposed [10] and [11] fixes: 0.95384 user seconds So the changes appear not to hurt performance. [PATCH 12/13] pthread_frame: use atomics for frame progress I agree with this; I also arrived at the same solution myself (load-acquire/store-release on progress[field]). The actual patch follows. Clearly a bit of cleanup is needed, but you get the general idea. J ----------------------------------------------------------------- diff -U 8 ./ffmpeg-3.0.2-BASE/libavcodec/pthread_frame.c ffmpeg-3.0.2-NoDCL/libavcodec/pthread_frame.c --- ./ffmpeg-3.0.2-BASE/libavcodec/pthread_frame.c 2016-03-29 04:25:21.000000000 +0200 +++ ffmpeg-3.0.2-NoDCL/libavcodec/pthread_frame.c 2016-07-29 12:52:59.706805890 +0200 @@ -129,18 +129,20 @@ { PerThreadContext *p = arg; FrameThreadContext *fctx = p->parent; AVCodecContext *avctx = p->avctx; const AVCodec *codec = avctx->codec; pthread_mutex_lock(&p->mutex); while (1) { - while (p->state == STATE_INPUT_READY && !fctx->die) - pthread_cond_wait(&p->input_cond, &p->mutex); + while (p->state == STATE_INPUT_READY) { + pthread_cond_wait(&p->input_cond, &p->mutex); + if (fctx->die) break; + } if (fctx->die) break; if (!codec->update_thread_context && THREAD_SAFE_CALLBACKS(avctx)) ff_thread_finish_setup(avctx); av_frame_unref(p->frame); p->got_frame = 0; @@ -318,22 +320,22 @@ return 0; pthread_mutex_lock(&p->mutex); release_delayed_buffers(p); if (prev_thread) { int err; - if (prev_thread->state == STATE_SETTING_UP) { + //if (prev_thread->state == STATE_SETTING_UP) { pthread_mutex_lock(&prev_thread->progress_mutex); while (prev_thread->state == STATE_SETTING_UP) pthread_cond_wait(&prev_thread->progress_cond, &prev_thread->progress_mutex); pthread_mutex_unlock(&prev_thread->progress_mutex); - } + //} err = update_context_from_thread(p->avctx, prev_thread->avctx, 0); if (err) { pthread_mutex_unlock(&p->mutex); return err; } } @@ -421,22 +423,22 @@ * If we're at the end of the stream, then we have to skip threads that * didn't output a frame, because we don't want to accidentally signal * EOF (avpkt->size == 0 && *got_picture_ptr == 0). */ do { p = &fctx->threads[finished++]; - if (p->state != STATE_INPUT_READY) { + //if (p->state != STATE_INPUT_READY) { pthread_mutex_lock(&p->progress_mutex); while (p->state != STATE_INPUT_READY) pthread_cond_wait(&p->output_cond, &p->progress_mutex); pthread_mutex_unlock(&p->progress_mutex); - } + //} av_frame_move_ref(picture, p->frame); *got_picture_ptr = p->got_frame; picture->pkt_dts = p->avpkt.dts; if (p->result < 0) err = p->result; _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
