Hi Aaron, On Thu, Aug 25, 2011 at 5:01 PM, Aaron Colwell <[email protected]> wrote: > Some recent changes to Chromium unit tests have caused FFmpeg decoding to > get regularly scrutinized by our ThreadSanitizer tool. This has led to the > detection of a variety of race conditions in libavcodec/pthread.c .
Before we start: are these _actual_ race conditions, i.e. stuff that we can reproduce in terms of "this file generates invalid output when played back with this many threads on this platform" (ideally reproducibly), or just theoretical findings by a tool? (Note: I want to again emphasize how much I'd love to get my hands on a tool like chess (by MS for VC++) on a relevant developer platform.) > Quick > inspection of the code reveals that there is significant use of the > "double-checked locking" anti-pattern Yeah, I've seen that during review. In my opinion it's fine, it leads to not having to grab locks (sometimes) when not necessary. It does look a little weird... I have no strong opinion on it. > and inconsistent use of > PerThreadContext::mutex and PerThreadContext::progress_mutex which are > likely causing of most of the race conditions. The mutexes don't protect variables, they protect particular state transitions. Unfortunately, race-finder tools aren't very good at these kind of concepts. Vitor played a little with them also, but didn't have much luck in terms of finding actual bugs (Vitor, is this accurate?). > I wanted to ask a few > questions before jumping in and trying to fix this. > 1. Are you already aware of these issues? I'm not sure if they're issues. An issue is that tandberg3_b h264 reference sample fails to decoder correctly on x86-32 (but not x86-64) with 2 threads (but not 1 or >2 threads). If a tool claims that a particular variable is not properly protected but it turns out that the situation in which that would occur can never occur because it's in a state transition that does not exist, or only when two particular threads interoperate that never interoperate, then I don't think that's a bug. > 2. Is someone already working on fixing these? No. > 3. Who is the expert on libavcodec/pthread.c so I can direct questions to > them? Alexander Strange wrote it, he's on this list and knows the code well. I reviewed it and fixed some bugs, I consider myself somewhat familiar with the code, but not as much as Alexander. Michael may also be familiar with parts of it, IIRC he wrote earlier versions of that concept/patch (?). > 4. Why is "double-checked locking" being used? Will there be significant > protest if I remove it? Leaving this for Alexander. I noticed it but didn't mind either way, so left it. > 5. What is the relationship between PerThreadContext::mutex > & PerThreadContext::progress_mutex and what member variables are they > supposed to protect? I've seen cases where only mutex is held, mutex & > progress_mutex are held, and only progress_mutex is held. At times these 3 > cases appear to modify the same state variables. ThreadSanitizer is getting > stuck on some test runs now which leads me to believe that there is a > locking pattern that results in deadlock. As said above, specific state transitions are protected by separate mutexes. They don't protect variables per se, but rather variables related to transferring data (e.g. buffers to be decoded, or decoded picture data) between threads. > 6. Is someone else interested in tackling this? If there's a bug (like e.g. tandberg3_b fails to decode with 2 threads on x86-32), I'm very much interested in tackling this. Ronald _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
