Hi Anton,

Nice to see your work on race/UB removal.

I have been working on removing races from libavcodec/pthread_frame.c
as part of a wider effort to remove TSan-detected races from Firefox,
which uses ffmpeg-3.0.2.  This is tracked starting at
https://bugzilla.mozilla.org/show_bug.cgi?id=1274256#c34.

Your fixes mostly look good and necessary to me.  However, for patches
10 and 11 of your series, I had a different approach, and I am
wondering if you would consider using it instead.  The proposed patch
is at the end of this message.


[PATCH 10/13] pthread_frame: use atomics for signalling threads to die

I believe that changing fctx->die to an atomic_int makes TSan quiet but
does not really remove the race, which still exists at a higher level
of granularity.  The store to fctx->die should be atomic, yes, but I
rewrote the reading side of it like this:

     pthread_mutex_lock(&p->mutex);
     while (1) {
-            while (p->state == STATE_INPUT_READY && !fctx->die)
-                pthread_cond_wait(&p->input_cond, &p->mutex);
+        while (p->state == STATE_INPUT_READY) {
+            pthread_cond_wait(&p->input_cond, &p->mutex);
+            if (fctx->die) break;
+        }

         if (fctx->die) break;

This is for ffmpeg 3.0.2, slightly different from trunk, but the idea
is the same: move the read of fctx->die so that it is guarded by p->mutex.


[PATCH 11/13] pthread_frame: use atomics for PerThreadContext.state

There are two instances of double-checked locking on
PerThreadContext::state, one in submit_packet, one in
ff_thread_decode_frame.  Per analysis at
https://bugzilla.mozilla.org/show_bug.cgi?id=1274256#c40 I believe the
double-checked locking saves 3 global memory operations per decoded
frame and so is pointless as an optimisation.

I don't think it is desirable to make ::state atomic, for the same
reason as mentioned above for fctx->die.  Instead I prefer to fix
races using the pthread_* functions wherever possible.


To assess the effects of my proposed fixes to [10] and [11], I tested
ffmpeg decoding 3000 frames of a vp9 file at frame size 240x100.  This
is chosen so as to make the frames relatively small and hence to
maximise the relative overhead of the per-frame synchronisation.
I used 100 threads -- again, so as to simulate a worst case, and ran
tests 100 times, on a 2.7 GHz Intel Haswell machine, which was quiet
(99.3% idle CPU cycles).  This is with ffmpeg-3.0.2 compiled with
gcc-5.3.1 at -O2.  The command was:

./ffmpeg-3.0.2/ffmpeg -threads 100 -i testfile.webm -f null -

Results, averaged over the 100 runs, are:

Baseline 3.0.2:                           0.95587 user seconds
3.0.2 with proposed [10] and [11] fixes:  0.95384 user seconds

So the changes appear not to hurt performance.


[PATCH 12/13] pthread_frame: use atomics for frame progress

I agree with this; I also arrived at the same solution myself
(load-acquire/store-release on progress[field]).


The actual patch follows.  Clearly a bit of cleanup is needed, but
you get the general idea.

J

-----------------------------------------------------------------


diff -U 8 ./ffmpeg-3.0.2-BASE/libavcodec/pthread_frame.c
ffmpeg-3.0.2-NoDCL/libavcodec/pthread_frame.c
--- ./ffmpeg-3.0.2-BASE/libavcodec/pthread_frame.c      2016-03-29
04:25:21.000000000 +0200
+++ ffmpeg-3.0.2-NoDCL/libavcodec/pthread_frame.c       2016-07-29
12:52:59.706805890 +0200
@@ -129,18 +129,20 @@
 {
     PerThreadContext *p = arg;
     FrameThreadContext *fctx = p->parent;
     AVCodecContext *avctx = p->avctx;
     const AVCodec *codec = avctx->codec;

     pthread_mutex_lock(&p->mutex);
     while (1) {
-            while (p->state == STATE_INPUT_READY && !fctx->die)
-                pthread_cond_wait(&p->input_cond, &p->mutex);
+        while (p->state == STATE_INPUT_READY) {
+            pthread_cond_wait(&p->input_cond, &p->mutex);
+            if (fctx->die) break;
+        }

         if (fctx->die) break;

         if (!codec->update_thread_context && THREAD_SAFE_CALLBACKS(avctx))
             ff_thread_finish_setup(avctx);

         av_frame_unref(p->frame);
         p->got_frame = 0;
@@ -318,22 +320,22 @@
         return 0;

     pthread_mutex_lock(&p->mutex);

     release_delayed_buffers(p);

     if (prev_thread) {
         int err;
-        if (prev_thread->state == STATE_SETTING_UP) {
+        //if (prev_thread->state == STATE_SETTING_UP) {
             pthread_mutex_lock(&prev_thread->progress_mutex);
             while (prev_thread->state == STATE_SETTING_UP)
                 pthread_cond_wait(&prev_thread->progress_cond,
&prev_thread->progress_mutex);
             pthread_mutex_unlock(&prev_thread->progress_mutex);
-        }
+        //}

         err = update_context_from_thread(p->avctx, prev_thread->avctx, 0);
         if (err) {
             pthread_mutex_unlock(&p->mutex);
             return err;
         }
     }

@@ -421,22 +423,22 @@
      * If we're at the end of the stream, then we have to skip threads that
      * didn't output a frame, because we don't want to accidentally signal
      * EOF (avpkt->size == 0 && *got_picture_ptr == 0).
      */

     do {
         p = &fctx->threads[finished++];

-        if (p->state != STATE_INPUT_READY) {
+        //if (p->state != STATE_INPUT_READY) {
             pthread_mutex_lock(&p->progress_mutex);
             while (p->state != STATE_INPUT_READY)
                 pthread_cond_wait(&p->output_cond, &p->progress_mutex);
             pthread_mutex_unlock(&p->progress_mutex);
-        }
+        //}

         av_frame_move_ref(picture, p->frame);
         *got_picture_ptr = p->got_frame;
         picture->pkt_dts = p->avpkt.dts;

         if (p->result < 0)
             err = p->result;
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to