Jack O'Quin wrote: > On Sun, Oct 19, 2008 at 2:55 AM, Paul Davis <[EMAIL PROTECTED]> wrote: > >> i don't know whether to shoot myself or eat another couple of the >> oft-promised hats. > > Don't beat yourself up too badly. Multiple threads accessing shared memory > is *very* tricky. Even smart people (like you) get it wrong sometimes.
I agree with Jack. I remember a friend of mine once qualifying multi-threading of "kernel quantization". This is complex but also challenging :) Although your understanding of these matters is far beyond mine, I think that I have a good professional experience of coding methodology. And to me the problem here is indeed methodological: there is a lack of unit tests in JACK. >> so the next question is how best to prevent it. as far as i can see we >> have a couple of proposals: >> >> 1) fons' design, which never actually wraps readptr or writeptr, but >> masks the address used to access the data buffer >> >> 2) removing the intermediate state's visibility >> >> i admit to preferring (2) even though i know that with a 64 bit index, >> not wrapping the ptrs is not really a problem. > > I also prefer (2). > >> however, it is not totally clear to me how to prevent an optimizing >> compiler from doing the wrong thing here. unlike the claims made by >> someone involved with portaudio, i believe that it is correct to declare >> the read_ptr and write_ptr volatile, so that the compiler knows that it >> cannot try to be clever about it accesses of the "other" ptr (i.e. >> read_ptr for the writing thread, and vice versa). maybe the comment on >> the use of volatile was based on some idea that i thought it made the >> variables thread safe, which is and never was the case. >> >> i suspect that the safest code looks like: >> >> size_t tmp = (ptr + incr) & mask; >> barrier(); >> ptr = tmp; >> >> but i am not sure whether barrier needs to be read, write or both. >> >> i think that the simpler code: >> >> ptr = (ptr + incr) & mask; >> >> is subject to potential compiler and/or processor "optimization" that >> might reduce it back to the problem case of two ops without an >> intermediate load/store location. the volatile declaration ought to >> prevent the compiler from doing this, and i don't see why a processor >> would do this, ever, but clearly i've already been deeply wrong about >> this. does anybody know for certain? > > It is best to avoid assumptions about what some future compiler may > consider an "optimization". If the register pressure is high at some > point in the program, it may decide to store some value just to free up > register space for other variables. > > The "volatile" declaration should remove any need for the compiler > barrier() statement, AFAICT. Note that barrier() is a compiler > directive, and has no effect on the CPU's ability to reorder cache > operations in an SMP memory hierarchy. > > Here is a fairly clear and complete description of the memory barrier > issues: http://lxr.linux.no/linux/Documentation/memory-barriers.txt > > As best I can tell, both ring buffer threads require "general" memory > barriers, because they both read and write (different) shared data. In > Linux kernel terms, they'd need to use smp_mb(), since the problems > they address are multiprocessor-only. But, since JACK is not compiled > differently for UP and SMP, the full mb() seems more appropriate. > > That stuff makes my head hurt. Maybe that is because your approach is too theoretical. I've read many theories about whether memory barriers are needed or not in lock-free ring buffers, and I think it might lead nowhere without experimenting, that is: write test cases. Until now, I just couldn't write a test that fail because of the lack of memory barrier on x86. However, I think I found another bug in Jack ringbuffer, by writing another test. It's a bit of a weird test, I call it "bit circle": I start 16 "node" threads. Each node : - communicate with its adjacent node through 2 ring buffers - is responsible for shifting an integer by one bit - send the shifted result to the next node through a ring buffer - checks that the value read from previous node is correct. On my Debian Quad Core and Mac OS X Core Duo boxes, this test fails with Jack's ringbuffer even with my patch applied. But it succeeds with Portaudio (both with and without memory barriers), and also with Fons' implementation. I wish someone could eventually run my test suite on a PowerPC, especially SMP. The usage is unchanged: svn co http://svn.samalyse.com/misc/rbtest cd rbtest make test The run time is now shorter, about 2 minutes. Below is the output. You can see that jack (test-bit-circle-jack) fails the bit circle test, even when patched (test-bit-circle-jack-fix1). "|" is printed when a node thread starts, and "-" when a node has read 1000 values, to ensure data is really flowing. The test-int-array-* tests are the same as my original test. All *-lfq tests use Fons's Lfq ringbuffer (modified to use char instead of uint32_t) as backend. ./run-tests.sh bit-circle 512 jack jack-fix1 portaudio portaudio-nobarrier lfq test-bit-circle-jack - starting (5s max) - buffer size: 512 || FAILURE: 2 != 514 test-bit-circle-jack-fix1 - starting (5s max) - buffer size: 512 || FAILURE: 2 != 514 test-bit-circle-portaudio - starting (5s max) - buffer size: 512 ||-|-|-||-|-|-|-|-||--|||--|---- SUCCESS test-bit-circle-portaudio-nobarrier - starting (5s max) - buffer size: 512 |||--|-|||---||-||-|-|--||--|--- SUCCESS test-bit-circle-lfq - starting (5s max) - buffer size: 512 |||--|||-|||-|||--|||-------|--- SUCCESS ./run-tests.sh int-array 16 jack jack-fix1 portaudio portaudio-nobarrier lfq test-int-array-jack - starting (20s max) - array/buffer size: 8/16 [reader started] [writer started] [flowing] 52572 != 52568 at offset 0 FAILURE in chunk 91822 test-int-array-jack-fix1 - starting (20s max) - array/buffer size: 8/16 [reader started] [writer started] [flowing] SUCCESS test-int-array-portaudio - starting (20s max) - array/buffer size: 8/16 [reader started] [writer started] [flowing] SUCCESS test-int-array-portaudio-nobarrier - starting (20s max) - array/buffer size: 8/16 [reader started] [writer started] [flowing] SUCCESS test-int-array-lfq - starting (20s max) - array/buffer size: 8/16 [writer started] [reader started] [flowing] SUCCESS -- Olivier Guilyardi / Samalyse _______________________________________________ Linux-audio-dev mailing list [email protected] http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
