Hi Aaron,

On Thu, Aug 25, 2011 at 5:01 PM, Aaron Colwell <[email protected]> wrote:
> Some recent changes to Chromium unit tests have caused FFmpeg decoding to
> get regularly scrutinized by our ThreadSanitizer tool. This has led to the
> detection of a variety of race conditions in libavcodec/pthread.c .

Before we start: are these _actual_ race conditions, i.e. stuff that
we can reproduce in terms of "this file generates invalid output when
played back with this many threads on this platform" (ideally
reproducibly), or just theoretical findings by a tool?

(Note: I want to again emphasize how much I'd love to get my hands on
a tool like chess (by MS for VC++) on a relevant developer platform.)

> Quick
> inspection of the code reveals that there is significant use of the
> "double-checked locking" anti-pattern

Yeah, I've seen that during review. In my opinion it's fine, it leads
to not having to grab locks (sometimes) when not necessary. It does
look a little weird... I have no strong opinion on it.

> and inconsistent use of
> PerThreadContext::mutex and PerThreadContext::progress_mutex which are
> likely causing of most of the race conditions.

The mutexes don't protect variables, they protect particular state
transitions. Unfortunately, race-finder tools aren't very good at
these kind of concepts. Vitor played a little with them also, but
didn't have much luck in terms of finding actual bugs (Vitor, is this
accurate?).

> I wanted to ask a few
> questions before jumping in and trying to fix this.
> 1. Are you already aware of these issues?

I'm not sure if they're issues. An issue is that tandberg3_b h264
reference sample fails to decoder correctly on x86-32 (but not x86-64)
with 2 threads (but not 1 or >2 threads). If a tool claims that a
particular variable is not properly protected but it turns out that
the situation in which that would occur can never occur because it's
in a state transition that does not exist, or only when two particular
threads interoperate that never interoperate, then I don't think
that's a bug.

> 2. Is someone already working on fixing these?

No.

> 3. Who is the expert on libavcodec/pthread.c so I can direct questions to
> them?

Alexander Strange wrote it, he's on this list and knows the code well.
I reviewed it and fixed some bugs, I consider myself somewhat familiar
with the code, but not as much as Alexander. Michael may also be
familiar with parts of it, IIRC he wrote earlier versions of that
concept/patch (?).

> 4. Why is "double-checked locking" being used? Will there be significant
> protest if I remove it?

Leaving this for Alexander. I noticed it but didn't mind either way, so left it.

> 5. What is the relationship between PerThreadContext::mutex
> & PerThreadContext::progress_mutex and what member variables are they
> supposed to protect? I've seen cases where only mutex is held, mutex &
> progress_mutex are held, and only progress_mutex is held. At times these 3
> cases appear to modify the same state variables. ThreadSanitizer is getting
> stuck on some test runs now which leads me to believe that there is a
> locking pattern that results in deadlock.

As said above, specific state transitions are protected by separate
mutexes. They don't protect variables per se, but rather variables
related to transferring data (e.g. buffers to be decoded, or decoded
picture data) between threads.

> 6. Is someone else interested in tackling this?

If there's a bug (like e.g. tandberg3_b fails to decode with 2 threads
on x86-32), I'm very much interested in tackling this.

Ronald
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to