Re: [LAD] [LAU] Simple, easy multithreaded circular buffer library for Linux?

Olivier Guilyardi Mon, 20 Oct 2008 06:13:46 -0700

Jack O'Quin wrote:
> On Sun, Oct 19, 2008 at 2:55 AM, Paul Davis <[EMAIL PROTECTED]> wrote:
> 
>> i don't know whether to shoot myself or eat another couple of the
>> oft-promised hats.
> 
> Don't beat yourself up too badly.  Multiple threads accessing shared memory
> is *very* tricky.  Even smart people (like you) get it wrong sometimes.


I agree with Jack. I remember a friend of mine once qualifying multi-threading
of "kernel quantization". This is complex but also challenging :)

Although your understanding of these matters is far beyond mine, I think that I
have a good professional experience of coding methodology. And to me the problem
here is indeed methodological: there is a lack of unit tests in JACK.

>> so the next question is how best to prevent it. as far as i can see we
>> have a couple of proposals:
>>
>>   1) fons' design, which never actually wraps readptr or writeptr, but
>>      masks the address used to access the data buffer
>>
>>   2) removing the intermediate state's visibility
>>
>> i admit to preferring (2) even though i know that with a 64 bit index,
>> not wrapping the ptrs is not really a problem.
> 
> I also prefer (2).
> 
>> however, it is not totally clear to me how to prevent an optimizing
>> compiler from doing the wrong thing here. unlike the claims made by
>> someone involved with portaudio, i believe that it is correct to declare
>> the read_ptr and write_ptr volatile, so that the compiler knows that it
>> cannot try to be clever about it accesses of the "other" ptr (i.e.
>> read_ptr for the writing thread, and vice versa). maybe the comment on
>> the use of volatile was based on some idea that i thought it made the
>> variables thread safe, which is and never was the case.
>>
>> i suspect that the safest code looks like:
>>
>>        size_t tmp = (ptr + incr) & mask;
>>        barrier();
>>        ptr = tmp;
>>
>> but i am not sure whether barrier needs to be read, write or both.
>>
>> i think that the simpler code:
>>
>>       ptr = (ptr + incr) & mask;
>>
>> is subject to potential compiler and/or processor "optimization" that
>> might reduce it back to the problem case of two ops without an
>> intermediate load/store location. the volatile declaration ought to
>> prevent the compiler from doing this, and i don't see why a processor
>> would do this, ever, but clearly i've already been deeply wrong about
>> this. does anybody know for certain?
> 
> It is best to avoid assumptions about what some future compiler may
> consider an "optimization".  If the register pressure is high at some
> point in the program, it may decide to store some value just to free up
> register space for other variables.
> 
> The "volatile" declaration should remove any need for the compiler
> barrier() statement, AFAICT.  Note that barrier() is a compiler
> directive, and has no effect on the CPU's ability to reorder cache
> operations in an SMP memory hierarchy.
> 
> Here is a fairly clear and complete description of the memory barrier
> issues:  http://lxr.linux.no/linux/Documentation/memory-barriers.txt
> 
> As best I can tell, both ring buffer threads require "general" memory
> barriers, because they both read and write (different) shared data.  In
> Linux kernel terms, they'd need to use smp_mb(), since the problems
> they address are multiprocessor-only.  But, since JACK is not compiled
> differently for UP and SMP, the full mb() seems more appropriate.
> 
> That stuff makes my head hurt.

Maybe that is because your approach is too theoretical. I've read many theories
about whether memory barriers are needed or not in lock-free ring buffers, and I
think it might lead nowhere without experimenting, that is: write test cases.

Until now, I just couldn't write a test that fail because of the lack of memory
barrier on x86. However, I think I found another bug in Jack ringbuffer, by
writing another test.

It's a bit of a weird test, I call it "bit circle": I start 16 "node" threads.
Each node :
- communicate with its adjacent node through 2 ring buffers
- is responsible for shifting an integer by one bit
- send the shifted result to the next node through a ring buffer
- checks that the value read from previous node is correct.

On my Debian Quad Core and Mac OS X Core Duo boxes, this test fails with Jack's
ringbuffer even with my patch applied. But it succeeds with Portaudio (both with
and without memory barriers), and also with Fons' implementation.

I wish someone could eventually run my test suite on a PowerPC, especially SMP.

The usage is unchanged:
svn co http://svn.samalyse.com/misc/rbtest
cd rbtest
make test

The run time is now shorter, about 2 minutes. Below is the output. You can see
that jack (test-bit-circle-jack) fails the bit circle test, even when patched
(test-bit-circle-jack-fix1). "|" is printed when a node thread starts, and "-"
when a node has read 1000 values, to ensure data is really flowing.

The test-int-array-* tests are the same as my original test. All *-lfq tests use
Fons's Lfq ringbuffer (modified to use char instead of uint32_t) as backend.

./run-tests.sh bit-circle 512 jack jack-fix1 portaudio portaudio-nobarrier lfq

test-bit-circle-jack - starting (5s max) - buffer size: 512
|| FAILURE: 2 != 514

test-bit-circle-jack-fix1 - starting (5s max) - buffer size: 512
|| FAILURE: 2 != 514

test-bit-circle-portaudio - starting (5s max) - buffer size: 512
||-|-|-||-|-|-|-|-||--|||--|---- SUCCESS

test-bit-circle-portaudio-nobarrier - starting (5s max) - buffer size: 512
|||--|-|||---||-||-|-|--||--|--- SUCCESS

test-bit-circle-lfq - starting (5s max) - buffer size: 512
|||--|||-|||-|||--|||-------|--- SUCCESS

./run-tests.sh int-array 16 jack jack-fix1 portaudio portaudio-nobarrier lfq

test-int-array-jack - starting (20s max) - array/buffer size: 8/16
[reader started] [writer started] [flowing] 52572 != 52568 at offset 0
FAILURE in chunk 91822

test-int-array-jack-fix1 - starting (20s max) - array/buffer size: 8/16
[reader started] [writer started] [flowing] SUCCESS

test-int-array-portaudio - starting (20s max) - array/buffer size: 8/16
[reader started] [writer started] [flowing] SUCCESS

test-int-array-portaudio-nobarrier - starting (20s max) - array/buffer size: 
8/16
[reader started] [writer started] [flowing] SUCCESS

test-int-array-lfq - starting (20s max) - array/buffer size: 8/16
[writer started] [reader started] [flowing] SUCCESS

-- 
  Olivier Guilyardi / Samalyse
_______________________________________________
Linux-audio-dev mailing list
[email protected]
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev

Re: [LAD] [LAU] Simple, easy multithreaded circular buffer library for Linux?

Reply via email to