subject:"\[LAD\] a \*simple\* ring buffer, comments pls\?"

On 07/12/2011 10:27 AM, Tim Blechmann wrote:
 OTOH, if you have a number of threads at the same priority
 as Jack's and doing audio work (e.g. to use all the CPUs of
 an SMP machine) then using locks between them (but no other
 threads) should be OK - depending a bit on how they are used.
 So, you can use locks as long as that's only meant to synchronize realtime
 threads with each other? Should some master thread (could be the JACK process
 thread) have a realtime priority slightly higher than the other (worker)
 realtime threads? What are the caveats in general?
 
 yes and no: it is perfectly fine to use locks to use multiple real-time 
 threads, 
 but the thread A fails to acquire a lock, it will be suspended and woken up 
 once 
 thread B releases the lock. the time between `B releases the lock' and `A 
 resumes its execution' is the scheduling latency, which is both os and 
 hardware 
 related.

I understand.

 using preempt_rt, the scheduling latency can be very low (like 10 
 microseconds), 
 if cpu frequency scaling is applied or smt/hyperthreading is enabled it can 
 be 
 as high as 250 microseconds (which is already quite significant, if one is 
 using 
 small signal vector sizes).

That's interesting. We're actually benchmarking scheduling latency at the 
moment.

 however one can avoid the scheduling latency by using spinlocks if one can 
 ensure that none of the synchronized threads can be preempted. personally i 
 achieve this by (a) using real-time scheduling, (b) using not more real-time 
 threads than physical cores and (c) pinning the rt threads to separate cores.

Ah yes, you ensure that threads run on separate cores. That really makes sense.
Thanks for the tip.

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-12 Thread Tim Blechmann

 using preempt_rt, the scheduling latency can be very low (like 10
 microseconds), if cpu frequency scaling is applied or smt/hyperthreading is
 enabled it can be as high as 250 microseconds (which is already quite
 significant, if one is using small signal vector sizes).
 
 That's interesting. We're actually benchmarking scheduling latency at the
 moment.

btw, i have discussed this briefly in my master thesis, section 4.1 [1]. the 
effect of the scheduling latency can also be observed in the benchmarks given 
in 
section 5.

and it would be great, if you can share your results, i'd be curious to see 
them!

cheers, tim

[1] http://tim.klingt.org/publications/tim_blechmann_supernova.pdf

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-12 Thread Dan Muresan

 updating the read index, yes?  how is that expressed as portably as
 possible?  Does __sync_synchronized reliably do that?  (The
 documentation is surprisingly short...)

I wonder if

{
pthread_mutex_t dummy = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(dummy);
pthread_mutex_unlock(dummy);
}

doesn't provide a portable full memory barrier. The dummy is different
each time, so no contention -- but still inefficient since  this would
be a 2-step full barrier. Nevertheless, it could be a portable
fallback.

Oh, and you probably need a barrier in the consummer too, just
reasoning from symmetry.

-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-12 Thread Emanuel Rumpf

Will this solve any of the problems you are worrying about ?
Anyway, I'm mentioning it:

http://gcc.gnu.org/projects/gomp/

-- 
E.R.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 12 July 2011 13:44, Dan Muresan danm...@gmail.com wrote:
 I wonder if

 {
 pthread_mutex_t dummy = PTHREAD_MUTEX_INITIALIZER;
 pthread_mutex_lock(dummy);
 pthread_mutex_unlock(dummy);
 }

 doesn't provide a portable full memory barrier.

According to 
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11
the answer must surely be yes -- pthread_mutex_lock provides a memory
barrier and the only (explicit) exception is for recursive mutexes
already held by the caller.

 Oh, and you probably need a barrier in the consummer too

Yes, before updating the read index (I said read index earlier when I
meant write index).

Thinking it over and going back over some references and earlier
threads here (e.g. much earlier ones from Olivier et al) it does seem
that this should be enough.  This particular situation isn't so
complicated after all.  I think the more I read earlier during this
thread (and reading around) generally about memory ordering, the more
I was beginning to feel as if the entire subject was a source of only
trouble.


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 07/12/2011 09:45 PM, Chris Cannam wrote:

 Thinking it over and going back over some references and earlier
 threads here (e.g. much earlier ones from Olivier et al) it does seem
 that this should be enough.  This particular situation isn't so
 complicated after all.  I think the more I read earlier during this
 thread (and reading around) generally about memory ordering, the more
 I was beginning to feel as if the entire subject was a source of only
 trouble.

Quite interestingly, I have noticed that discussions about memory barriers are
often somehow endless. What happened in the past is that I saw countless
discussions about whether they are needed, whether they are not, and people
would argue a lot and passionately. So I thought, maybe there's a hidden topic
behind that. A memory barrier... Well, that very much reminds me of this
memory loss which happens to all of us in the childhood. It turns early years
into fuzzy memories. That is a barrier, too.

Whether such psychological barriers are needed or not, that's an interesting
question :)

--
  Olivier






___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-12 Thread Paul Davis

On Tue, Jul 12, 2011 at 4:20 PM, Olivier Guilyardi l...@samalyse.com wrote:

 Quite interestingly, I have noticed that discussions about memory barriers are
 often somehow endless. What happened in the past is that I saw countless
 discussions about whether they are needed, whether they are not, and people
 would argue a lot and passionately.

I think the problem is that memory barriers are almost never required
when writing normal code, and so people (including myself) are not
exposed to their implementation or their use very much. and indeed,
there are precious few library implementations of memory barriers, nor
are they widely documented in a way that suggests that their use is
normal.

by contrast, they get used all over the place in the kernel
(relatively speaking), so if you tend to have a lot of exposure to
kernel code, then calls to mb() or whatever they use these days will
be quite familiar.

there's the additional problem that this discussion normally ends up
confusing two separate topics that many people seem to think are the
same (they are not):

   1) do you need to actively ensure correct thread-level synchronization
  between the reader and writer of a single-reader/single-writer FIFO?
  Put differently, do you need to use synchronization mechanisms
  semantically equivalent to a mutex to ensure that any
arbitrary execution
  order of the 2 threads does not cause incorrect behaviour?

   2) do you need memory barriers to ensure correct synchronization
 for this kind of data structure in the face of possible hardware level
 instruction reordering?

My feeling is that the answer to (1) is no and the answer to (2) is yes.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 12 July 2011 21:20, Olivier Guilyardi l...@samalyse.com wrote:
 Quite interestingly, I have noticed that discussions about memory barriers are
 often somehow endless. [...] So I thought, maybe there's a hidden topic
 behind that. A memory barrier...

Indeed -- perhaps these discussions need some sort of memory write
barrier, ensuring that everything discussed before the barrier will be
recalled from store subsequently, instead of being discussed anew.
But this list would be awfully quiet if we had such a thing.


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 12 July 2011 21:36, Paul Davis p...@linuxaudiosystems.com wrote:
   2) do you need memory barriers to ensure correct synchronization
         for this kind of data structure in the face of possible hardware level
         instruction reordering?

The transactional metaphor for this kind of thing seems useful -- the
idea that we've written everything, now we commit for our readers
feels like a helpful way to picture the points where barriers might be
necessary.

Since transactional integrity is not provided for us, the commit needs
to be either

 * protected with a memory barrier, if it doesn't matter that the data
is available before it has been announced but does matter if the data
is announced before being available
 * an atomic swap, if the new data must not be available before it has
been announced and also there is only a single point of reference to
the new data
 * mutex protected, if the references to any changes may be
significant or we are not confident of either of the previous cases

...?


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 07/12/2011 11:07 PM, Chris Cannam wrote:
 On 12 July 2011 21:20, Olivier Guilyardi l...@samalyse.com wrote:
 Quite interestingly, I have noticed that discussions about memory barriers 
 are
 often somehow endless. [...] So I thought, maybe there's a hidden topic
 behind that. A memory barrier...
 
 Indeed -- perhaps these discussions need some sort of memory write
 barrier, ensuring that everything discussed before the barrier will be
 recalled from store subsequently, instead of being discussed anew.
 But this list would be awfully quiet if we had such a thing.

I didn't say the discussions were useless or uninteresting. I just meant that I
think there's a strong symbolic aspect in this topic, which makes it even more
interesting ;)

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 07/12/2011 10:36 PM, Paul Davis wrote:
 On Tue, Jul 12, 2011 at 4:20 PM, Olivier Guilyardi l...@samalyse.com wrote:
 
 Quite interestingly, I have noticed that discussions about memory barriers 
 are
 often somehow endless. What happened in the past is that I saw countless
 discussions about whether they are needed, whether they are not, and people
 would argue a lot and passionately.
 
 I think the problem is that memory barriers are almost never required
 when writing normal code, and so people (including myself) are not
 exposed to their implementation or their use very much. and indeed,
 there are precious few library implementations of memory barriers, nor
 are they widely documented in a way that suggests that their use is
 normal.
 
 by contrast, they get used all over the place in the kernel
 (relatively speaking), so if you tend to have a lot of exposure to
 kernel code, then calls to mb() or whatever they use these days will
 be quite familiar.
 
 there's the additional problem that this discussion normally ends up
 confusing two separate topics that many people seem to think are the
 same (they are not):
 
1) do you need to actively ensure correct thread-level synchronization
   between the reader and writer of a single-reader/single-writer FIFO?
 Put differently, do you need to use synchronization mechanisms
   semantically equivalent to a mutex to ensure that any
 arbitrary execution
   order of the 2 threads does not cause incorrect behaviour?
 
2) do you need memory barriers to ensure correct synchronization
  for this kind of data structure in the face of possible hardware 
 level
  instruction reordering?
 
 My feeling is that the answer to (1) is no and the answer to (2) is yes.

Thing is, of every single thing that has been said on this thread about memory
barriers and ringbuffers, no one can prove anything. On this thread, on others,
on LAD and elsewhere. For example, no one can write a test case which fails when
memory barriers are missing in a ringbuffer implementation.

That's a pretty rare and intriguing situation.

--
  Olivier

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 12 July 2011 22:32, Olivier Guilyardi l...@samalyse.com wrote:
 Thing is, of every single thing that has been said on this thread about memory
 barriers and ringbuffers, no one can prove anything. On this thread, on 
 others,
 on LAD and elsewhere. For example, no one can write a test case which fails 
 when
 memory barriers are missing in a ringbuffer implementation.

There is one in the iPad 2 example Sean posted a link to earlier in the thread:

http://wanderingcoder.net/2011/04/01/arm-memory-ordering/

I haven't tried it, lacking an iPad 2 or any other multicore ARM computer.


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 07/12/2011 11:37 PM, Chris Cannam wrote:
 On 12 July 2011 22:32, Olivier Guilyardi l...@samalyse.com wrote:
 Thing is, of every single thing that has been said on this thread about 
 memory
 barriers and ringbuffers, no one can prove anything. On this thread, on 
 others,
 on LAD and elsewhere. For example, no one can write a test case which fails 
 when
 memory barriers are missing in a ringbuffer implementation.
 
 There is one in the iPad 2 example Sean posted a link to earlier in the 
 thread:
 
 http://wanderingcoder.net/2011/04/01/arm-memory-ordering/
 
 I haven't tried it, lacking an iPad 2 or any other multicore ARM computer.

Ah right, I read that too quickly... Thing is, I'm always suspicious with
quickly crafted ringbuffers as the one on this blog post. It's never like a
mature implementation. I will try and run my little test suite on a such device.

--
  Olivier

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-12 Thread Dan Kegel

On Tue, Jul 12, 2011 at 2:32 PM, Olivier Guilyardi l...@samalyse.com wrote:
 no one can write a test case which fails when
 memory barriers are missing in a ringbuffer implementation.

That's an interesting assertion.  It's kind of tempting to write some
buggy circular buffers and test that assumption on common hardware.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-12 Thread Arnold Krille

On Tuesday 12 July 2011 22:20:48 Olivier Guilyardi wrote:
 On 07/12/2011 09:45 PM, Chris Cannam wrote:
  Thinking it over and going back over some references and earlier
  threads here (e.g. much earlier ones from Olivier et al) it does seem
  that this should be enough.  This particular situation isn't so
  complicated after all.  I think the more I read earlier during this
  thread (and reading around) generally about memory ordering, the more
  I was beginning to feel as if the entire subject was a source of only
  trouble.
 
 Quite interestingly, I have noticed that discussions about memory barriers
 are often somehow endless.

You mean there should be a barrier to discussions about memory barriers?

Sorry, couldn't resist...


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?


Le 13/07/11 00:23, Dan Kegel a écrit :

On Tue, Jul 12, 2011 at 2:32 PM, Olivier Guilyardil...@samalyse.com  wrote:

no one can write a test case which fails when
memory barriers are missing in a ringbuffer implementation.


That's an interesting assertion.  It's kind of tempting to write some
buggy circular buffers and test that assumption on common hardware.


Not sure what you mean by buggy circular buffer, but we already did quite a lot 
of testing in the past.


That said, this article about the iPad2 is quite frightening. I've read that 
again and the guy seems to know what he's talking about. His little FIFO and his 
testing methodology both seem correct to me. That's the first potential proof I 
ever hear about:

http://wanderingcoder.net/2011/04/01/arm-memory-ordering/

This guy is quietly saying that a lot of code out there is about to break, for 
real. As I mentioned previously, multi-core ARM devices are in the wild now. In 
addition to the iPad2, possibly vulnerable Android devices are being sold 
massively right now.


Problem is I don't have a such device at the moment.

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-12 Thread Paul Davis

On Tue, Jul 12, 2011 at 7:31 PM, Arnold Krille arn...@arnoldarts.de wrote:

 You mean there should be a barrier to discussions about memory barriers?

No. He means that there needs to be a barrier inserted into the
discussion before its possible to move onto the next stage.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?


Le 13/07/11 02:08, Fred Gleason a écrit :

On Jul 12, 2011, at 19:50 05, Olivier Guilyardi wrote:


Problem is I don't have a such device at the moment.


Is your testing code online somewhere?  I do have such a setup (iPad 2 
provisioned as a development device in XCode), and may take a crack at this.


That is nice! Yes I do.

On a standard shell you just need to:
$ svn co http://svn.samalyse.com/misc/rbtest
$ cd rbtest
$ make test

But you may need to adapt things a bit for the iPad.

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?


Le 13/07/11 01:56, Paul Davis a écrit :

On Tue, Jul 12, 2011 at 7:31 PM, Arnold Krillearn...@arnoldarts.de  wrote:


You mean there should be a barrier to discussions about memory barriers?


No. He means that there needs to be a barrier inserted into the
discussion before its possible to move onto the next stage.


Yes, I think that might well be it :)

But it's getting late now. Good night

--
  Olivier


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

 We have a variable 'int xval' that is being modified

I notice I have mixed up (switched) x and xval in my previous reply...

 extern int xval;

 void f(void)
 {
    int a, b, c, ... x, y, z;

    x = xval;

    // lots of code using a ... z;
 }

Well, if you are content with any old possible value of xval (i.e. you
have no synchronization on it anywhere), you are right; but then you
might as well make xval a constant -- just use the first value it ever
had. You have no guarantees that the thread calling f() will EVER have
its hardware cache synchronized to whoever is producing xval. Cache
coherency again...

Otherwise, I bet you *do* have some synchronization, somewhere, for
xval in the thread that calls f. Wherever that place is, save the
cached value, i.e.

mutex_lock()
x = xval
mutex_unlock()
f (x)

... and now there is no need for volatile, data flow becomes clear etc.

So you are right, the way you have structured your short snippet of
code you might be preventing an optimization -- but there is still too
little context, and I bet structuring the code differently will
obviate the need for volatile, plus provide predictable data flow.

 A. If xval is being modified by an interrupt handler
 then clearly you can't use the mutex - you can't risk
 to block the interrupt handler.

Let's say a signal handler, not an interrupt handler. Let's stick with
userspace (C89 + Posix) -- kernels have different rules, different
primitives etc.

 *there is no difference* between xval being modified by
 an interrupt handler, or by another thread. The difference

There is a difference. A signal will modify x immediately. A different
thread running on a different cache might never propagate the update
the to hardware cache used by f()'s thread.

A few more things, for the sake of completeness:

In the signal case, you should of course make sure the signal is
caught by the thread who reads x, by using pthread_sigmask()
appropriately. Or make the program non-threaded. Otherwise, you now
have two problems...

Also, strictly speaking if you're using a sighandler plus volatile,
you should also declare xval as sig_atomic_t.


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Olivier Guilyardi

On 07/11/2011 03:45 AM, Sean Bolton wrote:
 On Jul 10, 2011, at 6:34 PM, Sean Bolton wrote:
 On Jul 10, 2011, at 2:41 PM, Paul Davis wrote:
 do we have SMP systems these days that do not guarantee cache coherency?

 Yes. PowerPC and Alpha do not. UltraSPARC v9 and ARMv6/ARM11 and later
 have modes where they do not (and linux on a SPARC v9 runs in that
 mode.)
 
 And how about the iPad 2?:
 http://wanderingcoder.net/2011/04/01/arm-memory-ordering/

Good catch... Multi-core ARM devices are actually arriving massively. With
Android, there's the Motorola Atrix, the Samsung Galaxy S II, etc..

I think it wouldn't hurt to run the ringbuffer tests again..

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread James Morris

On 11 July 2011 20:19, Olivier Guilyardi l...@samalyse.com wrote:

 Good catch... Multi-core ARM devices are actually arriving massively. With
 Android, there's the Motorola Atrix, the Samsung Galaxy S II, etc..


What about my toaster? :-P

I've ended up going back to Fons's pragmatism. If
non-blocking/lock-free programming is so impossibly difficult,
requiring intimate hardware knowledge of numerous different
architectures then there's only one solution available to people like
me, and that's to code for AMD64/Intel and use the existing ringbuffer
implementations.

I'd be interested though if my usage of  __sync_bool_compare_and_swap
and  __sync_fetch_and_and improves the ring buffer at all? I like the
fact the implementation I'm using is so simple. Trouble is, using the
GCC builtins instead of volatile slows it down. Guess that means it's
duff.

James.

8-

#include rng_buf.h

#include stdlib.h
#include string.h


struct _RingBuffer
{
void** buf;
void** bufend;

void** w;
void** r;
};


RngBuf* rng_buf_new(size_t count)
{
size_t sz = 1;
RngBuf* mb = malloc(sizeof(RngBuf));

if (!mb)
return 0;

for (sz = 1; sz  count; sz = 1)
;

mb-buf = calloc(sz, sizeof(void*));

if (!mb-buf)
{
free(mb);
return 0;
}

mb-bufend = mb-buf + sz - 1;
mb-w = mb-buf;
mb-r = mb-buf;

return mb;
}


void rng_buf_free(RngBuf* mb)
{
free(mb-buf);
free(mb);
}


size_t rng_buf_write(RngBuf* mb, const void* data)
{
if (__sync_bool_compare_and_swap(mb-w, 0, data))
{
mb-w = (mb-w == mb-bufend) ? mb-buf : mb-w + 1;
return (size_t)1;
}

return (size_t)0;
}


void* rng_buf_read(RngBuf* mb)
{
void* data;

if ((data = __sync_fetch_and_and(mb-r, 0)))
{
mb-r = (mb-r == mb-bufend) ? mb-buf : mb-r + 1;
return data;
}

return NULL;
}


8-
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Fons Adriaensen

On Mon, Jul 11, 2011 at 03:02:57PM +0300, Dan Muresan wrote:

 Well, if you are content with any old possible value of xval (i.e. you
 have no synchronization on it anywhere), you are right; but then you
 might as well make xval a constant -- just use the first value it ever
 had. You have no guarantees that the thread calling f() will EVER have
 its hardware cache synchronized to whoever is producing xval. Cache
 coherency again...

The typical use case would be xval being some parameter used
by e.g. a Jack callback and updated by e.g. MIDI events, or 
a measurement; and f() being the code that runs periodically
to update the GUI element for that parameter or measurement.
There is no synchronisation and occasionally using an out-
dated value has no fatal consequences - it will very probably
be put right on the next update which is 1/10 second or so
in the future. The display is informative only, correct audio
processing does in no way depend on it being up to date.

 Otherwise, I bet you *do* have some synchronization, somewhere, for
 xval in the thread that calls f. Wherever that place is, save the
 cached value, i.e.
 
 mutex_lock()
 x = xval
 mutex_unlock()
 f (x)

That would mean the other side has to use the mutex as well
when updating xval, and in a Jack callback that is 'not done',
at least not if the mutex could be held by a non-realtime
thread such as one updating the GUI.

 In the signal case, you should of course make sure the signal is
 caught by the thread who reads x, by using pthread_sigmask()
 appropriately. Or make the program non-threaded. Otherwise, you now
 have two problems...

Not if there is no synchronisation involved as in the example
above. If all the signal handler has to do is update xval, it
doesn't matter in which thread it runs.

 Also, strictly speaking if you're using a sighandler plus volatile,
 you should also declare xval as sig_atomic_t.

Strictly true, but R/W of ints is atomic on the platforms that 
matter to me.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread David Olofson

On Monday 11 July 2011, at 22.32.08, James Morris jwm.art@gmail.com 
wrote:
 On 11 July 2011 20:19, Olivier Guilyardi l...@samalyse.com wrote:
  Good catch... Multi-core ARM devices are actually arriving massively.
  With Android, there's the Motorola Atrix, the Samsung Galaxy S II, etc..
 
 What about my toaster? :-P
 
 I've ended up going back to Fons's pragmatism. If
 non-blocking/lock-free programming is so impossibly difficult,
 requiring intimate hardware knowledge of numerous different
 architectures then there's only one solution available to people like
 me, and that's to code for AMD64/Intel and use the existing ringbuffer
 implementations.

Also, if/when it's time to port, find the code and/or information you need at 
the time, and test thoroughly on the actual hardware. These things can usually 
be done on anything, one way or another, more or less efficiently.

The only thing that's worse than missing support for some new platform is 
support that doesn't actually work properly. Lots of debugging fun to be had 
that way... :-)


-- 
//David Olofson - Consultant, Developer, Artist, Open Source Advocate

.--- Games, examples, libraries, scripting, sound, music, graphics ---.
|   http://consulting.olofson.net  http://olofsonarcade.com   |
'-'
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Chris Cannam

On 11 July 2011 21:32, James Morris jwm.art@gmail.com wrote:
 I've ended up going back to Fons's pragmatism. If
 non-blocking/lock-free programming is so impossibly difficult,
 requiring intimate hardware knowledge of numerous different
 architectures then there's only one solution available to people like
 me, and that's to code for AMD64/Intel and use the existing ringbuffer
 implementations.

Perhaps the pragmatic solution is to _lock_ the shared buffer?

I'm not at all sound on this subject, but I wouldn't be surprised if
Dan was right that memory ordering guarantees on the read/write index
variables might still leave the data that is just copied into the
buffer exposed to cache coherency problems.

I know taking locks in a RT process is deeply frowned upon around
here, but a critical section juts across the code used to update the
buffer and read back from it -- with an accompanying guarantee of
actually getting the right data out again -- might not be such a bad
tradeoff these days as we'd be inclined to think?


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Chris Cannam

On 11 July 2011 21:50, Chris Cannam can...@all-day-breakfast.com wrote:
 [...] a critical section juts across the code

just across


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Fons Adriaensen

On Mon, Jul 11, 2011 at 09:59:25PM +0300, Dan Muresan wrote:
 
 and the NPTL code in glibc *seems* to perform a memory barrier only on
 sem_post().

Wouldn't that seem quite natural ? Semas provide one-way communication.
It's the sem_post() that means there is an event to be seen by some
other thread. So it has to make sure that any data related to it is
visible to the receiver. The receiver taking the event (returning
from sem_wait()), or checking for it (calling sem_wait()), is not
meant to be an event for the other side. You need a second sema if
events have to go both ways.

For a ringbuffer you need two anyway if the buffer is longer than
one item, and you want to signal both of 'not empty' and 'not full' 
in the appropriate direction.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Paul Davis

On Mon, Jul 11, 2011 at 4:50 PM, Chris Cannam
can...@all-day-breakfast.com wrote:
 On 11 July 2011 21:32, James Morris jwm.art@gmail.com wrote:
 I've ended up going back to Fons's pragmatism. If
 non-blocking/lock-free programming is so impossibly difficult,
 requiring intimate hardware knowledge of numerous different
 architectures then there's only one solution available to people like
 me, and that's to code for AMD64/Intel and use the existing ringbuffer
 implementations.

 Perhaps the pragmatic solution is to _lock_ the shared buffer?

no, the pragmatic solution is to use memory barriers liberally applied.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Tim E. Real

On July 11, 2011 04:50:06 pm Chris Cannam wrote:
 I know taking locks in a RT process is deeply frowned upon

Likely been answered before, but good time for me to ask:
What is the reason it is not recommended?
Is it simply because of a long time involved in acquiring the lock, or is it 
 because the lock might block some other Jack thread from running?

The same reason why other things like 'malloc' or 'new' are not 
 recommended either?

Thanks. Tim.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Tim Blechmann

 and the NPTL code in glibc *seems* to perform a memory barrier only on
 sem_post().
 
 Wouldn't that seem quite natural ? Semas provide one-way communication.
 It's the sem_post() that means there is an event to be seen by some
 other thread. So it has to make sure that any data related to it is
 visible to the receiver. The receiver taking the event (returning
 from sem_wait()), or checking for it (calling sem_wait()), is not
 meant to be an event for the other side. You need a second sema if
 events have to go both ways.

from my understanding, sem_post() implies a write barrier (stores have been 
executed before sem_post()) and sem_wait() a read barrier (loads have to be 
executed after sem_wait()).

tim

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Fons Adriaensen

On Mon, Jul 11, 2011 at 05:15:26PM -0400, Tim E. Real wrote:
 
 Likely been answered before, but good time for me to ask:
 What is the reason it is not recommended?
 Is it simply because of a long time involved in acquiring the lock, or is it 
  because the lock might block some other Jack thread from running?

The time required to take a lock if it is free is probably
not significant in most cases (this should not involve a
system call).

The problem with e.g. a Jack callback trying to take a lock
is that the lock could be held by a non-RT thread, and you
have no idea how long it could take before that thread
releases it. Even if the Jack thread blocks waiting for
the lock, that doesn't mean the one holding the lock will
continue, and even if it does that provides no guarantee.

OTOH, if you have a number of threads at the same priority
as Jack's and doing audio work (e.g. to use all the CPUs of
an SMP machine) then using locks between them (but no other
threads) should be OK - depending a bit on how they are used.

 The same reason why other things like 'malloc' or 'new' are not 
  recommended either?

These could take a long time in some cases (if your process
needs more memory from the system for example), so they can't
be used safely.

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Arnold Krille

On Monday 11 July 2011 23:15:26 Tim E. Real wrote:
 On July 11, 2011 04:50:06 pm Chris Cannam wrote:
  I know taking locks in a RT process is deeply frowned upon
 
 Likely been answered before, but good time for me to ask:
 What is the reason it is not recommended?
 Is it simply because of a long time involved in acquiring the lock, or is
 it because the lock might block some other Jack thread from running?
 The same reason why other things like 'malloc' or 'new' are not
  recommended either?

Real-time means as fast as possible. Waiting to acquire a lock isn't exactly 
what you would call fast...
Waiting for a lock is the same as waiting for the kernels/OSs memory managment 
to allocate some space.

Have fun,

Arnold


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

 The problem with e.g. a Jack callback trying to take a lock
 is that the lock could be held by a non-RT thread, and you
 have no idea how long it could take before that thread
 releases it. Even if the Jack thread blocks waiting for
 the lock, that doesn't mean the one holding the lock will
 continue, and even if it does that provides no guarantee.

Priority inversion. A low-priority disk thread could hold a lock
needed by the high-priority Jack process thread. But the disk thread
is pre-empted by some other non-related medium-priority process. The
medium-priority process could keep running for a long time.

So the inversion is that, indirectly, a high-priority thread is
waiting for some medium-priority process.

Some RTOS's deal with this kind of stuff (like hard real-time OSs)

-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Dan Mills

On Tue, 2011-07-12 at 00:12 +0200, Arnold Krille wrote:

 
 Real-time means as fast as possible. 

Err not really. 

Real-time means Has a bounded response time, locks generally put your
completion time at the mercy of another process that is often not
designed to have a bounded response time. 

It is probably more correct to say that locks should be used with
extreme caution in RT processes as their use makes the relevant part of
another thread part of the time critical response timing.

Trylock is an interesting case can can actually be used to solve some of
this, have two copies of the working data structure, one protected by a
lock, then in the RT thread call trylock and if it succeeds copy the
data protected by the lock over the data set used by the RT process and
release the lock.

If it fails then you still have a coherent set of data, and it might
succeed the next time, this is useful for UI data sets (in both
directions). 

Regards, Dan.


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Robin Gareus

[oops forgot to CC the list at first]

On 07/11/2011 11:15 PM, Tim E. Real wrote:
 On July 11, 2011 04:50:06 pm Chris Cannam wrote:
 I know taking locks in a RT process is deeply frowned upon
 
 Likely been answered before, but good time for me to ask:
 What is the reason it is not recommended?
 Is it simply because of a long time involved in acquiring the lock, or is it 
  because the lock might block some other Jack thread from running?

The latter. The real-time thread would block until the other thread
(that currently holding the lock) releases the lock.

How long it takes until the other thread releases the lock is undefined
(context switches, process priority dependent, etc) and thus the concept
is not suitable for real-time.

 The same reason why other things like 'malloc' or 'new' are not 
  recommended either?

Yes. malloc, printf, etc can not guarantee to return in time.

Allocation of new memory may involve various system-calls, flush
cache-lines, result in swapping, context-switches, etc... On most
systems the worst-case execution time it takes to allocate new memory
can not be guaranteed to be below a certain limit.

There are various attempts to provide constant time - O(1) - memory
allocation and de-allocation code for real-time systems. None of them
have reached widespread adoption. Often the drawbacks (e.g. decreased
available memory due to fragmentation) outweigh the benefits, and
pre-allocation on application level is easier to implement.

HTH,
robin


 Thanks. Tim.
 ___
 Linux-audio-dev mailing list
 Linux-audio-dev@lists.linuxaudio.org
 http://lists.linuxaudio.org/listinfo/linux-audio-dev
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Robin Gareus

On 07/12/2011 12:12 AM, Arnold Krille wrote:
 On Monday 11 July 2011 23:15:26 Tim E. Real wrote:
 On July 11, 2011 04:50:06 pm Chris Cannam wrote:
 I know taking locks in a RT process is deeply frowned upon

 Likely been answered before, but good time for me to ask:
 What is the reason it is not recommended?
 Is it simply because of a long time involved in acquiring the lock, or is
 it because the lock might block some other Jack thread from running?
 The same reason why other things like 'malloc' or 'new' are not
  recommended either?
 
 Real-time means as fast as possible.

No, real-time means - _guaranteed_ to be done before time X.
AKA. guaranteed response time.

real-time (tasks|kernels) are often slower on average than non-RT
(tasks|kernels).
As for the linux-kernel: This is due to the overhead introduced by
real-time scheduling (take care of process priority)

While non-real-time (tasks|kernels) may be faster on average; they may
also occasionally take more time than they should. For RT-tasks this is
not acceptable.

e.g.

non-RT task 5 runs: 10 sec, 11 sec, 9 sec,*20sec*,  9 sec (avg 11.8)
RT task 5 runs: 12 sec, 12 sec, 11sec, 12sec,  13 sec (avg 12.0)

I hope that explains it well for you.
Cheers!
robin
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-11 Thread Olivier Guilyardi

On 07/12/2011 12:06 AM, Fons Adriaensen wrote:

 OTOH, if you have a number of threads at the same priority
 as Jack's and doing audio work (e.g. to use all the CPUs of
 an SMP machine) then using locks between them (but no other
 threads) should be OK - depending a bit on how they are used.

So, you can use locks as long as that's only meant to synchronize realtime
threads with each other? Should some master thread (could be the JACK process
thread) have a realtime priority slightly higher than the other (worker)
realtime threads? What are the caveats in general?

--
  Olivier

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

 from my understanding, sem_post() implies a write barrier (stores have been
 executed before sem_post()) and sem_wait() a read barrier (loads have to be
 executed after sem_wait()).

Given that a mutex can be replaced with a semaphore initialized with 1
(then lock == sem_wait, unlock == sem_post), your statement should be
true as long as the semaphore value is 0 / 1.

However, I don't know if your statement must still be true when the
semaphore is used as a count tied to a number of items, i.e. when it
always hovers above 1. POSIX talks about a thread of control, but in
this case, there is no exclusion, thus no thread of control.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11

(same link as a few posts ago)


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

 For a ringbuffer you need two anyway if the buffer is longer than
 one item, and you want to signal both of 'not empty' and 'not full'
 in the appropriate direction.

Well, if one of the threads is periodic (like a Jack process thread)
it can use sem_getvalue() on the same semaphore to figure things out
-- it doesn't need to be signalled on another semaphore. There are
some small complications that can be solved by double-signalling from
the other thread, or by wasting one slot...


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-10 Thread Tim Blechmann

 Ah. pthread_mutex_lock() / unlock(), as EXTERNAL functions, will never
 be optimized away or inlined. Now, being all sequence points, if you
 simply do
 
 pthread_mutex_lock();
 xval = x;
 pthread_mutex_unlock();
 
 the compiler is not allowed to move statements out the locked section
 or reorder them in any way (without need for any volatile qualifiers).

the hardware would be allowed to reorder them ... this is the reason why mutex 
implementations involve memory barriers ...

the main problem is the lack of a memory model for multi-threaded applications 
at the level of the language (c or c++). fortunately this is about to change 
with c++0x and probably c1x.

tim

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-10 Thread Dan Muresan

 the hardware would be allowed to reorder them ... this is the reason why mutex
 implementations involve memory barriers ...

Yes, the hardware would be allowed to reorder them, so
pthread_mutex_lock() has memory barriers. I think everyone knew that
much :)

However my point to Fons was that, because pthread_mutex_lock() is an
external function, the compiler is not allowed to make any assumptions
about the global variable x (both functions might modify it): it
cannot re-read x after unlock(), or read x before lock(). This was the
missing ingredient.

So it's a cooperation of sequence point-rules and memory barriers that
achieves the effects of a critical section in C.

 the main problem is the lack of a memory model for multi-threaded applications
 at the level of the language (c or c++). fortunately this is about to change
 with c++0x and probably c1x.

So in 10 years we will be able to rely on a conformant compiler being
available on all relevant platforms :)


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-10 Thread Tim Blechmann

  the main problem is the lack of a memory model for multi-threaded
  applications at the level of the language (c or c++). fortunately this
  is about to change with c++0x and probably c1x.
 
 So in 10 years we will be able to rely on a conformant compiler being
 available on all relevant platforms :)

http://www.chaoticmind.net/~hcb/projects/boost.atomic/


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-10 Thread Paul Davis

On Sun, Jul 10, 2011 at 1:28 PM, Tim Blechmann t...@klingt.org wrote:
  the main problem is the lack of a memory model for multi-threaded
  applications at the level of the language (c or c++). fortunately this
  is about to change with c++0x and probably c1x.

 So in 10 years we will be able to rely on a conformant compiler being
 available on all relevant platforms :)

 http://www.chaoticmind.net/~hcb/projects/boost.atomic/

if it all works ... very nice.

but note that its only been tested on a couple of versions of gcc on a
couple of *nix-ish platforms.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-10 Thread Tim Blechmann

   the main problem is the lack of a memory model for multi-threaded
   applications at the level of the language (c or c++). fortunately this
   is about to change with c++0x and probably c1x.
  
  So in 10 years we will be able to rely on a conformant compiler being
  available on all relevant platforms :)
  
  http://www.chaoticmind.net/~hcb/projects/boost.atomic/
 
 if it all works ... very nice.
 
 but note that its only been tested on a couple of versions of gcc on a
 couple of *nix-ish platforms.

the number of supported compilers in the documentation is outdated. it supports 
the most commonly used compilers and multiple architectures.
if a compiler is not supported natively, it uses a fallback implementation 
based 
on a pool of spinlocks, which of course is not lock-free, but c++0x doesn't 
guarantee lock-freedom either ...

i've been using it for quite some time in boost.lockfree (in a slightly 
modified 
version).

tim


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-10 Thread Fons Adriaensen

On Sun, Jul 10, 2011 at 06:05:45PM +0300, Dan Muresan wrote:

 Ah. pthread_mutex_lock() / unlock(), as EXTERNAL functions, will never
 be optimized away or inlined. Now, being all sequence points, if you
 simply do
 
 pthread_mutex_lock();
 xval = x;
 pthread_mutex_unlock();
 
 the compiler is not allowed to move statements out the locked section
 or reorder them in any way (without need for any volatile qualifiers).

OK. This is becoming an interesting discussion, so please
allow me to restate clearly the context.

We have a variable 'int xval' that is being modified
by forces unknown to the code we are discussing. This
code is a function f() which uses the value of xval,
but the algorithm implemented by f() requires that
the same value is used at all points where f() uses
xval during a single invocation of that function.

So we have:

extern int xval;

void f(void)
{
int a, b, c, ... x, y, z;

x = xval;

// lots of code using a ... z;
}

Since f() has much more local variables than the CPU
has registers, the compiler could be tempted to reuse
the register used to store x for some other purpose
at some point in the code. It could do that in two ways:

1. Store x on the stack, and read that location when
xval is required again, or

2. Just reuse the register without saving it, and read
the memory location 'xval' again when required.

(2) could make the algorithm fail, because xval could
have changed. So we want to prevent that happening.

The solution I propose is to declare xval volatile.
This forces the compiler to read it just once, as 
expressed by the source code. So it either will do (1),
or maybe decide not to trash the register holding x at
all but use another one.


The solution you propose is to protect xval by a mutex. 
I invite you to consider the following:

A. If xval is being modified by an interrupt handler
then clearly you can't use the mutex - you can't risk
to block the interrupt handler.

B. From the point of view of the code we are discussing
*there is no difference* between xval being modified by
an interrupt handler, or by another thread. The difference
is completely irrelevant to f(). The only thing that
matters is that xval can change while f() executes. 

You would probably accept the 'volatile' if xval is
being written by an interrupt handler. Given (B), 
there is no good reason to reject it in either case.


 I see. But as I said, in general the cache coherency problem is worse
 than the pipeline reordering problem -- i.e. when there are multiple
 CPUs/cores using different caches, they may see actions out-of-order.

On that I absolutely agree - cache coherency is the real
problem, not pipelining. The latter should in fact be
transparent from a language such as C/C++.

Ciao,

-- 
FA
 


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-10 Thread Paul Davis

On Sun, Jul 10, 2011 at 5:14 PM, Fons Adriaensen f...@linuxaudio.org wrote:

 On that I absolutely agree - cache coherency is the real
 problem, not pipelining. The latter should in fact be
 transparent from a language such as C/C++.

i may be way out of the loop, but having worked with some of the early
massively parallel conventional processor systems of the late 80's
and early 90's (such as the sequent symmetry and the kendall square
research machines), my impression was that everyone gave up on
clever cache coherency because it turned out to be too hard (as an
engineering problem, not as a CS problem). i've gotten stuck with the
idea that the industry went for simple cache coherency that simply
does what any moderately skilled designer would do when faced with the
problem and no concerns about elegance or speed: locks, signalling,
and all the usual stuff that is the h/w equiivalent of the pthreads
mutex API.

do we have SMP systems these days that do not guarantee cache coherency?

btw: i do understand that whether they do or do not doesn't affect the
basic point about cache coherency possibly leading to incorrect data
being read.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-10 Thread Dan Kegel

Guys,
if you're writing code for your own use, and don't care about portability
or security, go ahead and use volatile as a synchronization primitive.

But if the code is going to be accepted into an open source project
that gets wide use, volatile is a bad idea. But don't take my word
for it; here's what kernel.org, cert.org, and an intel researcher say about it:

Volatile considered harmful,
http://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt
The use of volatile is likely to be seen as a bug and will bring
additional scrutiny to the code.

CERT Secure Coding Standards, recommendation POS03: Do not use
volatile as a synchronization primitive
https://www.securecoding.cert.org/confluence/display/seccode/POS03-C.+Do+not+use+volatile+as+a+synchronization+primitive
A variable declared as volatile is not cached in a register, leading
to this confusion that it can be used safely as a synchronization
primitive. When declared volatile, the compiler does not re-order the
sequence of reads and writes to that memory location. However, the
compiler might re-order these reads and writes with those to other
memory locations. This might result in non-atomic operations on the
synchronization variable resulting in errors.

Ad Hoc Synchronization Considered Harmful,
www.xiongww.com/papers/osdi10-xiong.pdf
Ad hoc synchronizations are error-prone. Significant percentages
(22-67%) of these ad hoc synchronizations introduced bugs or severe
performance issues.

Volatile is a 1980's solution to a 2000's problem, and it just doesn't
cut the mustard anymore.
- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-10 Thread Sean Bolton


On Jul 10, 2011, at 2:41 PM, Paul Davis wrote:
On Sun, Jul 10, 2011 at 5:14 PM, Fons Adriaensen  
f...@linuxaudio.org wrote:

On that I absolutely agree - cache coherency is the real
problem, not pipelining. The latter should in fact be
transparent from a language such as C/C++.


i may be way out of the loop, but having worked with some of the early
massively parallel conventional processor systems of the late 80's
and early 90's (such as the sequent symmetry and the kendall square
research machines), my impression was that everyone gave up on
clever cache coherency because it turned out to be too hard (as an
engineering problem, not as a CS problem). i've gotten stuck with the
idea that the industry went for simple cache coherency that simply
does what any moderately skilled designer would do when faced with the
problem and no concerns about elegance or speed: locks, signalling,
and all the usual stuff that is the h/w equiivalent of the pthreads
mutex API.

do we have SMP systems these days that do not guarantee cache  
coherency?


Yes. PowerPC and Alpha do not. UltraSPARC v9 and ARMv6/ARM11 and later
have modes where they do not (and linux on a SPARC v9 runs in that
mode.)

Here's some good reading that may help:

Memory Ordering in Modern Microprocessors, Part I
http://www.linuxjournal.com/article/8211

Memory Ordering in Modern Microprocessors, Part II
http://www.linuxjournal.com/article/8212

Linux Kernel Memory Barriers
http://www.kernel.org/doc/Documentation/memory-barriers.txt

ARM11 Processors: Ordering requirements for memory accesses
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0211i/Babcijbf.html


btw: i do understand that whether they do or do not doesn't affect the
basic point about cache coherency possibly leading to incorrect data
being read.


Good--without memory barriers, JACK's ringbuffer is at risk
for this corruption on these platforms.

Cheers,

-Sean

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-10 Thread Sean Bolton


On Jul 10, 2011, at 6:34 PM, Sean Bolton wrote:

On Jul 10, 2011, at 2:41 PM, Paul Davis wrote:
do we have SMP systems these days that do not guarantee cache  
coherency?


Yes. PowerPC and Alpha do not. UltraSPARC v9 and ARMv6/ARM11 and later
have modes where they do not (and linux on a SPARC v9 runs in that
mode.)


And how about the iPad 2?:
http://wanderingcoder.net/2011/04/01/arm-memory-ordering/

-Sean

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-09 Thread Dan Muresan

 The apps already need to do some type of synchronization internally.
 For example a player's disk thread, when its ringbuffer is full, needs
 to wait for the process thread to consume some data and thus free up

 Depends. If both ends are periodic processes no other synchronisation
 is required. And e.g. Jack callback is such a process, and likely to
 be one end.

How about the other end (i.e. the disk thread?) Would that
normally be periodic?

OK, even if your disk thread is periodic for some reason, how does
that argue for library-level synchronization, *instead of* app-level
synchronization? In this case the cost would be the same -- no loss.

 You may be right about the (HW as opposed to compiler) re-ordering of
 data w.r.t. pointers on some architectures. But AFAIK, at least on Intel
 and AMD writes are not re-ordered w.r.t. other writes from the same CPU,

From the same CPU? Are we regressing to non-SMP-only schemes? And
Intel and AMD only?

How about multiple cores / CPUs / caches? Pipeline reordering is not
the main concern (though it can happen) -- cache coherence is.

 Regarding the volatile declarations, at least on my version (which
 is slightly different from Jack's) there is no performance penalty.

Under which access patterns, with what compiler / optimization flags
etc? I would not make such generalizations... Volatile frustrates the
optimizer's ability to chose the optimal access patterns.

 So I keep them just as reminders that these data are shared and may
 change in unexpected ways.

Hijacking volatile for *manual* type checking, at the cost of
frustrating the optimizer? Andrei Alexandrescu once advocated that
approach for *automatic* type checking in a famous article
(http://drdobbs.com/cpp/184403766). I believe the shortcomings have
been thoroughly discussed in  comp.lang.c++.

If  you want to remind yourself, you could group the variable(s) and
the mutex / semaphore in a structure, or name them similarly etc.

 You are wrong in saying that 'volatile' has no place in multi-threading.
 It is the correct way to go if you want to ensure that a value is e.g.
 read/written just once even if it is used many times:

It has no place in properly synchronized threaded programs. And it
cannot guarantee the correctness of un-synchronized threaded programs
(unless you assume non-SMP, non-hyper-threaded, Intely-type hardware
-- *maybe*)

 extern volatile int xval;  // Written by other thread(s)

 void f (void)
 {
    int x;

    x = xval;

    // use x many times, it won't change.
 }

 Without the 'volatile', the compiler is free to read
 the memory value xval as many times as it wants, even
 if it has a local copy, and it probably will do so if
 you have many local variables.

What does that accomplish? You're merely frustrating the compiler's
ability to optimize. You're not achieving complete thread safety by
*adding* volatile -- not on arbitrary hardware. If your code is
completely thread-safe with volatile, it is also completely
thread-safe (and faster) without volatile. Volatile does not offer any
guarantees that cannot be later undone by the pipeline or CPU cache.


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-09 Thread Fons Adriaensen

On Sat, Jul 09, 2011 at 04:25:22PM +0300, Dan Muresan wrote:

  The apps already need to do some type of synchronization internally.
  For example a player's disk thread, when its ringbuffer is full, needs
  to wait for the process thread to consume some data and thus free up
 
  Depends. If both ends are periodic processes no other synchronisation
  is required. And e.g. Jack callback is such a process, and likely to
  be one end.
 
 How about the other end (i.e. the disk thread?) Would that
 normally be periodic?

It could be, and that would be perfectly ok in some cases.
But I'm not arguing agains synchronisation, and my own implementation
of this ringbuffer optionally provides in either direction (buffer
becoming non-empyt/non full).
 
 OK, even if your disk thread is periodic for some reason, how does
 that argue for library-level synchronization, *instead of* app-level
 synchronization? In this case the cost would be the same -- no loss.

I don't see the point.
 
  You may be right about the (HW as opposed to compiler) re-ordering of
  data w.r.t. pointers on some architectures. But AFAIK, at least on Intel
  and AMD writes are not re-ordered w.r.t. other writes from the same CPU,
 
 From the same CPU? Are we regressing to non-SMP-only schemes? And

No, I'm talking about SMP systems. Writing the data and updating
the write pointer is done by the same thread and hence CPU, these
actions won't be re-ordered.

 Intel and AMD only?

There is no legal obligation for code to be portable. Nor is there
any moral obligation. If I choose to support only Intel and AMD PCs
and not embedded systems or mobile devices (and for the kind of SW
I write that does make sense) then that is my choice, period.

I get usually sick when computer scientist or language buffs start
waving their finger about programming style etc. There is room for
some pragmatism in everything.

  Regarding the volatile declarations, at least on my version (which
  is slightly different from Jack's) there is no performance penalty.
 
 Under which access patterns, with what compiler / optimization flags
 etc? I would not make such generalizations... Volatile frustrates the
 optimizer's ability to chose the optimal access patterns.

In the example I provided the essential point is that there
is *one* *correct* access pattern which is to read it once
for each call to f(), to ensure that the same value is used
everywhere in that function. Declaring this value volatile
and taking a local copy does exactly the right thing.
The alternative would be protect it by a mutex for as long
as f() runs. For no good reason, as I don't mind it being
overwritten while f() runs. Would that be more 'optimal' ? 

Ciao,

-- 
FA


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 8 July 2011 02:04, Gabriel M. Beddingfield gabrb...@gmail.com wrote:
 On Thursday, July 07, 2011 07:50:57 pm James Morris wrote:
 I thought a lock-free ring buffer was supposed to be
 the easy solution!

 It is... when you re-use one that's already been written and
 debugged.  ;-)

 Why not copy/paste the JACK ringbuffer (C) or even Ardours
 (C++ Container)?

I think that when I was coding BoxySeq, I did look at the JACK
ringbuffer code and decided to simplify it for my purposes.  I fixed
it so there was no need for the byte count parameters for read/write,
and removed some of the functions I decided I didn't need (ie peek,
but then re-introduced my own versions). I found problems with my
implementation but it basically worked 99% of the time so I came back
to it the other day with the mistaken belief that atomic read/write
pointer operations along with a reduction of variables used for each
read/write operation would fix it. I was rather pleased actually with
how much this strategy made the code *look* cleaner, so surely it
would work!

James.


 -gabriel
 ___
 Linux-audio-dev mailing list
 Linux-audio-dev@lists.linuxaudio.org
 http://lists.linuxaudio.org/listinfo/linux-audio-dev

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 8 July 2011 02:04, Gabriel M. Beddingfield gabrb...@gmail.com wrote:

 Why not copy/paste the JACK ringbuffer (C) or even Ardours
 (C++ Container)?

Sorry, with my warbling I forgot all about asking when the lock
function of jack ringbuffer would be used?

Cheers,
James.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 8 July 2011 01:50, James Morris jwm.art@gmail.com wrote:
On 7 July 2011 13:10, James Morris jwm.art@gmail.com wrote:
just wondered if any more-experienced-than-i devs might comment on
this. written in c for c (obviously). i realize it's not portable
outside of GNU GCC (regarding the GCC atomic builtin funcs
__sync_***). meant for a single reader thread and a single writer
thread. comments regarding thread safety very much welcome. thanks in
advance.

james.

I didn't have time to test it before going to work, when posting
earlier, but was hoping someone who might see something immediately
wrong with it might comment. I'm mainly posting back to say it doesn't
work.

It does mostly work, but not always, which is as good as not working
at all. It looses the occasional write. For example my test program
has two threads and two buffers. The first buffer is used for thread 1
to send data to thread 2, and the second is used to send it back
again. With 70 items originating in thread 1, and both buffers large
enough to hold 32 items, 1 item is lost perhaps 1 in 7 program
executions.

I thought a lock-free ring buffer was supposed to be the easy solution!

I've just realized the problem was an error in the testing code. The
2nd thread would read an item from the 1st buffer, and then try to
write it to the 2nd buffer. When the test code failed to handle a
write failure (when both r/w ptrs of the 2nd buffer pointed to same
location *and* both were attempting to simultaneously access that same
location) it's loop back to read again, consequently discarding the
item that failed to write.

So I changed my test code thread loop and the problem goes away. The
loop has to stop reading when a write fails.

While I thought my rbuf was broken I read about how lock-free code is
not really lock-free, it just uses very fine grained locking.
http://stackoverflow.com/questions/2528969/lock-free-multi-threading-is-for-real-threading-experts/2530082#2530082

I'm still uncertain about the reliability of pointer operations being
atomic (otherwise why would glib provide atomic pointer ops?). I
currently assume it can't be guaranteed, which is why I wanted to use
the GCC atomic operations for reading/writing to the read/write
pointers.

So back to the fine grained (b)locking, I decided to count the noof
times it failed to write... I was going to provide counts of failures,
averages etc, but... Well, the counts seemed to increase the more
effort the code put into gathering statistics about failures.

So I'd thought I'd compare the execution time of 7000 items passing
through the two 32-item ring buffers, with the execution time 7000
items passing though 32-item jack-ringbuffers. details to come,
probably to throw all this out the window.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On 8 July 2011 12:24, James Morris jwm.art@gmail.com wrote:

 So I'd thought I'd compare the execution time of 7000 items passing
 through the two 32-item ring buffers, with the execution time 7000
 items passing though 32-item jack-ringbuffers. details to come,
 probably to throw all this out the window.


MSG_BUF_SIZE =  32,
MSG_COUNT = 70

JACK RING BUF:
[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 164 main(): read
back loop count:700032

real0m0.136s
user0m0.260s
sys 0m0.003s
[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 164 main(): read
back loop count:700032

real0m0.136s
user0m0.263s
sys 0m0.000s
[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 164 main(): read
back loop count:700032

real0m0.142s
user0m0.273s
sys 0m0.003s



MY RING BUF WITH ATOMIC POINTERS

[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 286 main(): read
back loop count:702657

real0m0.169s
user0m0.330s
sys 0m0.000s
[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 286 main(): read
back loop count:738329

real0m0.173s
user0m0.337s
sys 0m0.000s
[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 286 main(): read
back loop count:753976

real0m0.171s
user0m0.333s
sys 0m0.000s


A very simple data integrity check was used, (note: pointers to data
were stored in the rbufs rather than data), the main thread set the
items data to point to the item itself. the integrity check was that
it should come back intact, still pointing to itself. success in both
instances.

JACK's ringbuf, as most will have undoubtedly known all along, is
faster, and the test code required 5 iterations less than when
using my ring buf. maybe somewhere, the atomics are required?
james.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Olivier Guilyardi

On 07/08/2011 02:21 PM, James Morris wrote:

 JACK's ringbuf, as most will have undoubtedly known all along, is
 faster, and the test code required 5 iterations less than when
 using my ring buf. maybe somewhere, the atomics are required?
 james.

I cannot comment on atomics op, but we have done rather exhaustive testing of
existing ringbuffer implementations on this list in the past.

The small test suite doesn't include any benchmarks, but it does perform
integrity checks which, at the time, allowed to fix a bug in the JACK 
ringbuffer.

The code tests and includes 4 implementations: JACK ringbuffer, PortAudio
ringbuffer, FFMpeg FIFO, and Fons' LFQ. You can check it out and run it with:

$ svn co http://svn.samalyse.com/misc/rbtest
$ cd rbtest
$ make test

There is no dependency. Maybe that you could use these tests (especially
test-int-array) to check data integrity with your implementation.

Hope that helps

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Paul Davis

On Fri, Jul 8, 2011 at 7:24 AM, James Morris jwm.art@gmail.com wrote:

we've been through this several times on LAD-related mailing lists.

the usability of a lock free ringbuffer does NOT rely on the stuff
discussed in that article.

the reason it works is because even if there is a sync error between
threads, the error has no negative consequences. put differently, a
lock free ringbuffer does not rely on two threads interacting in a
particular way, it relies on the fact that if they interact in the
wrong way, all that happens is that the reader sees less data to
read than actually exists, and/or the writer sees less space to write
than is actually available. if your application views this as a
negative, then it will have to use some kind of locking primitives
to avoid it, but most audio applications can deal with this without
blinking.

this is entirely different from more generic lock free programming
techniques (e.g. lock free lists) where it is definitely true that you
need to understand memory barriers and so forth to ensure that they
are safe). its also important to recognize that the locks used by
memory barriers are processor-level, and not really equivalent in most
senses to the locks used by the kernel or user space. they do have the
capability to block a thread, but the delay is measured in
instructions (mostly) and does not involve process-level information.

the one wrinkle in this is that in theory a compiler could so
completely reorder instructions that even the basic assumptions that
make the single reader/single-writer ringbuffer safe would break down.
this is theoretically possible (and actually, not that hard to create
in practice if you were really trying to) but as oliver mentioned,
we've done some fairly exhaustive testing that when combined with a
moderately common sense approach to compiler design (e.g. the
existence of function scope and related stuff) makes it fundamentally
not worth worrying about this theoretical possibility.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread David Robillard

On Thu, 2011-07-07 at 20:04 -0500, Gabriel M. Beddingfield wrote:
 On Thursday, July 07, 2011 07:50:57 pm James Morris wrote:
  I thought a lock-free ring buffer was supposed to be
  the easy solution!
 
 It is... when you re-use one that's already been written and 
 debugged.  ;-)
 
 Why not copy/paste the JACK ringbuffer (C) or even Ardours 
 (C++ Container)?

Here's another C++ one, for the record:

http://svn.drobilla.net/lad/trunk/raul/raul/RingBuffer.hpp

-dr


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On Fri, Jul 08, 2011 at 09:21:55AM -0400, Paul Davis wrote:

 the one wrinkle in this is that in theory a compiler could so
 completely reorder instructions that even the basic assumptions that
 make the single reader/single-writer ringbuffer safe would break down.

AFAIK nothing fatal can happen if the variables involved are
declared volatile. A compiler is not allowed to omit, repeat,
or re-order instructions involving them.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Gabriel M. Beddingfield

On Friday, July 08, 2011 12:17:34 pm Fons Adriaensen wrote:
 On Fri, Jul 08, 2011 at 09:21:55AM -0400, Paul Davis 
wrote:
  the one wrinkle in this is that in theory a compiler
  could so completely reorder instructions that even the
  basic assumptions that make the single
  reader/single-writer ringbuffer safe would break down.
 
 AFAIK nothing fatal can happen if the variables involved
 are declared volatile. A compiler is not allowed to
 omit, repeat, or re-order instructions involving them.

Take for instance jack_ringbuffer_read(), which has this 
line:

  rb-read_ptr = (rb-read_ptr + n)  rb-size_mask;

There's a remote possibility that the compiler could 
optimize this as:

  rb-read_ptr += n;
  rb-read_ptr = rb-size_mask;

...and this would break the ringbuffer.  I don't know if the 
`volatile` keyword prevents this or not.

-gabriel
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On Fri, Jul 08, 2011 at 01:12:08PM -0500, Gabriel M. Beddingfield wrote:

 There's a remote possibility that the compiler could 
 optimize this as:
 
   rb-read_ptr += n;
   rb-read_ptr = rb-size_mask;
 
 ...and this would break the ringbuffer.  I don't know if the 
 `volatile` keyword prevents this or not.

It certainly does.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Arnold Krille

On Friday 08 July 2011 20:12:08 Gabriel M. Beddingfield wrote:
 On Friday, July 08, 2011 12:17:34 pm Fons Adriaensen wrote:
  On Fri, Jul 08, 2011 at 09:21:55AM -0400, Paul Davis
 
 wrote:
   the one wrinkle in this is that in theory a compiler
   could so completely reorder instructions that even the
   basic assumptions that make the single
   reader/single-writer ringbuffer safe would break down.
  
  AFAIK nothing fatal can happen if the variables involved
  are declared volatile. A compiler is not allowed to
  omit, repeat, or re-order instructions involving them.
 
 Take for instance jack_ringbuffer_read(), which has this
 line:
 
   rb-read_ptr = (rb-read_ptr + n)  rb-size_mask;
 
 There's a remote possibility that the compiler could
 optimize this as:
 
   rb-read_ptr += n;
   rb-read_ptr = rb-size_mask;
 
 ...and this would break the ringbuffer.  I don't know if the
 `volatile` keyword prevents this or not.

What would happen? The ringbuffer in jack is explicitely only for one-reader-
one-writer. So in this optimization, the only participant using the read_ptr 
to do something possibly bad, is the reading participant which is currently 
executing this code.
The writing participant can access that read_ptr for example to check the 
available space. But as the docs state (afair), the available sizes for 
read/write are not strict functions, the only thing that counts is if you have 
space for reading/writing. And that is fulfilled if read_ptr!=write_ptr...

Have fun,

Arnold


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Dan Muresan

   the one wrinkle in this is that in theory a compiler
   could so completely reorder instructions that even the
   basic assumptions that make the single
   reader/single-writer ringbuffer safe would break down.

Forget about the compiler, the hardware instruction pipeline could
reorder ANYTHING that is unsynchronized. Plus two different threads
might read/write to different cache memories. Here's a paragraph from
the Double Chekced Locking is Broken Declaration
(http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html):

... processors have their own locally cached copies of memory. On
some processors, unless the processor performs a cache coherence
instruction (e.g., a memory barrier), reads can be performed out of
stale locally cached copies, even if other processors used memory
barriers to force their writes into global memory.

There's in link in that article to how that can happen on an Alpha.

  AFAIK nothing fatal can happen if the variables involved
  are declared volatile. A compiler is not allowed to

The only thing that volatile accomplishes is to slow down
properly-synchronized programs. volatile is for signal handlers and
interrupt handlers, not for threading.

 read/write are not strict functions, the only thing that counts is if you have
 space for reading/writing. And that is fulfilled if read_ptr!=write_ptr...

This is not the real problem. Here's where reordering can hurt:

Without full sync, a non-owned pointer can appear to
  change before the pointed data is ready (due to cache non-coherence
  and access reordering by compiler or hardware). E.g. the consumer
  may see the write pointer move before its view of the pointed data
  is up-to-date, leading to bogus data. Alternatively, the producer
  may see the read pointer move before the consummer finishes
  loading the pointed data; the producer will thus see the
  still-in-use data as empty space and may overwrite it, leading to
  lost frames.

(From the write-up at the end of my own program, sintvert.c, at
https://github.com/danmbox/sintvert/blob/master/sintvert.c)

This reordering cannot be prevented without proper synchronization. So
my advice to anyone considering this would be to drop volatile and do
proper synchronization at the application level (i.e. semaphores,
since it has to work in real time).


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Dan Kegel

On Fri, Jul 8, 2011 at 7:23 PM, Dan Muresan danm...@gmail.com wrote:
 (http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html):
...
 The only thing that volatile accomplishes is to slow down
 properly-synchronized programs. volatile is for signal handlers and
 interrupt handlers, not for threading.
...
 This reordering cannot be prevented without proper synchronization. So
 my advice to anyone considering this would be to drop volatile and do
 proper synchronization at the application level (i.e. semaphores,
 since it has to work in real time).

+1

volatile is not the droid you're looking for.
- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On Fri, Jul 08, 2011 at 09:05:42PM +0200, Arnold Krille wrote:
 
 What would happen? The ringbuffer in jack is explicitely only for one-reader-
 one-writer. So in this optimization, the only participant using the read_ptr 
 to do something possibly bad, is the reading participant which is currently 
 executing this code.
 The writing participant can access that read_ptr for example to check the 
 available space. But as the docs state (afair), the available sizes for 
 read/write are not strict functions, the only thing that counts is if you 
 have 
 space for reading/writing. And that is fulfilled if read_ptr!=write_ptr...

Actually things are a little better: you can be sure that _at least_
the space indicated is available for reading or writing, not just that
you can read or write one item (byte in this case).

Looking at the way things are implemented, the right value would
be returned even if the unmasked 'other side' count is used.

I'm using a slightly different (C++) implementation: the stored
values equivalent to jack's rd_ptr and wr_ptr (wrong names, they
are not pointers but indices) are never masked. The 'size-1' mask
(with size = 2^N) is applied only when they are used to index the
buffer array.

This has two consequences:

- You can use all 2^n elements, there is no ambiguity between
  full and empty.

- The code is simplified a bit:

  n_avail_read  = wr_index - rd_index;
  n_avail_write = rd_index - wr_index + size;

  and no conditions, masking, or -1  are necessary.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Paul Davis

On Fri, Jul 8, 2011 at 3:53 PM, Fons Adriaensen f...@linuxaudio.org wrote:

 - You can use all 2^n elements, there is no ambiguity between
  full and empty.

so what does it mean when wr_index == rd_index?
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Paul Davis

On Fri, Jul 8, 2011 at 3:29 PM, Dan Kegel d...@kegel.com wrote:
 On Fri, Jul 8, 2011 at 7:23 PM, Dan Muresan danm...@gmail.com wrote:
 (http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html):
...
 The only thing that volatile accomplishes is to slow down
 properly-synchronized programs. volatile is for signal handlers and
 interrupt handlers, not for threading.
...
 This reordering cannot be prevented without proper synchronization. So
 my advice to anyone considering this would be to drop volatile and do
 proper synchronization at the application level (i.e. semaphores,
 since it has to work in real time).

 +1

 volatile is not the droid you're looking for.

could be true enough, but i will be happy to fake ignorance and say i
don't know for sure. but ...

the whole point single reader/single writer lock-free FIFOs is that
*synchronization doesn't matter*. to state it again: the whole essence
of the design is that the reader and writer can execute with any
timing behaviour whatsoever and the worst that can happen is that
they each think there is less data/space available than there actually
is. the design is NOT attempting to implement a synchronized
interaction at all, because it is not necessary.

this is why we don't care about the types of stuff that Dan Muresan
mentioned, except to the extent that it could actually lead to the
computation of data/space available being wrong in a deeper way. we do
NOT care about synchronization for these data types unless it could
actually cause the value of one of the indexes to be accessed at a
time that is intermediate in the sense of being between two C
expressions. it appears that at this time, this is not possible on
any architecture that you might be sensibly using lock free FIFOs
with.

IF we had a processor pipeline that was really big enough and really
did enough runtime optimization across the entire pipeline, then i can
imagine how this could turn into a problem. but AFAIK (and again this
subject has been hashed to death many times here and elsewhere),
current processors do not have this architecture and there's no sign
of them being imminent or even that desirable.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Tim Blechmann

  I thought a lock-free ring buffer was supposed to be
  the easy solution!
 
 It is... when you re-use one that's already been written and
 debugged.  ;-)
 
 Why not copy/paste the JACK ringbuffer (C) or even Ardours
 (C++ Container)?
 
 Here's another C++ one, for the record:
 
 http://svn.drobilla.net/lad/trunk/raul/raul/RingBuffer.hpp

the linux kernel and supercollider have their own implementations of this 
algorithm as well ... iirc it is also described in the book `the art of 
multiprocessor programming' by maurice herlihy and nir shavit.

and i would like to do some self-advertising: i have submitted some of my lock-
free data structures to boost, and they will be reviewed in the end of july 
(18th-27th). source and docs are online [1, 2] and i'd like to invite everybody 
with an interest in lock-free programming to take part in the review. all the 
provided data structures (wait-free spsc ringbuffer, lock-free mpmc fifo and 
lock-free mpmc lifo) are potentially interesting for audio applications and i 
heavily use them for my multiprocessor-aware supercollider server ...

the review itself will take place on the boost-dev mailinglist [3]

cheers, tim

[1] http://tim.klingt.org/boost_lockfree
[2] http://tim.klingt.org/git?p=boost_lockfree.git;a=summary
[3] http://www.boost.org/community/groups.html#main

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Arnold Krille

On Friday 08 July 2011 22:41:54 Paul Davis wrote:
 On Fri, Jul 8, 2011 at 3:53 PM, Fons Adriaensen f...@linuxaudio.org wrote:
  - You can use all 2^n elements, there is no ambiguity between
   full and empty.
 
 so what does it mean when wr_index == rd_index?

Empty for reading. Lots of space for writing.
Its much more interesting to see what happens when the indezes cross the 2^32 
or 2^64 mark, then the write-index will be smaller then the read-index untill 
the read-index catches up. But as far as I see currently, this wouldn't be a 
problem either. The writer only has to stop writing when its at read_ptr - 1.

Maybe I don't understand it all, but with fons approach I think it only works 
when the buffer-sizes are 2^n. When you have a buffer of say 5 elements, doing 
the modulo at the element-access and not at the read/write-head-movement, this 
will jump every now and then, right?

Have fun,

Arnold


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On Fri, Jul 08, 2011 at 04:41:54PM -0400, Paul Davis wrote:
 On Fri, Jul 8, 2011 at 3:53 PM, Fons Adriaensen f...@linuxaudio.org wrote:
 
  - You can use all 2^n elements, there is no ambiguity between
   full and empty.
 
 so what does it mean when wr_index == rd_index?

That the buffer is empty, rd_avail = size, 
wr_avail = 0


-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

On Fri, Jul 08, 2011 at 10:59:14PM +0200, Arnold Krille wrote:

 Empty for reading. Lots of space for writing.
 Its much more interesting to see what happens when the indezes cross the 2^32 
 or 2^64 mark, then the write-index will be smaller then the read-index untill 
 the read-index catches up. But as far as I see currently, this wouldn't be a 
 problem either. The writer only has to stop writing when its at read_ptr - 1.

It is assumed that the buffer is used correctly, so no one does e.g
a wr_commit(n) with n  wr_avail(), same for read. There is no
problem with wraparound at 2^32 or 2^64 if size is a power of 2.
 
 Maybe I don't understand it all, but with fons approach I think it only works 
 when the buffer-sizes are 2^n. When you have a buffer of say 5 elements, 
 doing 
 the modulo at the element-access and not at the read/write-head-movement, 
 this 
 will jump every now and then, right?

It works only if the size is a divider of the int size, so
it must be a power of 2. But that is the case for Jack's
implementation as well (or anything that uses  'n  (size-1)'
instead of 'n % size'.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Dan Kegel

On Fri, Jul 8, 2011 at 8:49 PM, Paul Davis p...@linuxaudiosystems.com wrote:
 could be true enough, but i will be happy to fake ignorance and say i
 don't know for sure. but ...

I sure as heck don't know for sure :-)

 the whole point single reader/single writer lock-free FIFOs is that
 *synchronization doesn't matter*.

At least on x86, where Intel was very careful to preserve the illusion
of coherence, it works... usually?

http://en.wikipedia.org/wiki/Non-blocking_algorithm seems to agree with you,
but http://www.rossbencina.com/code/lockfree seems to agree with you
only on a uniprocessor system.

http://www.codeproject.com/KB/threads/LockFree.aspx has lots of interesting
links.In particular, it links to a couple articles by Herb Sutter,

http://drdobbs.com/cpp/210600279
http://drdobbs.com/high-performance-computing/211601363

He makes it sound like it's only possible using compare-and-swap
atomic primitives.
- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Sean Bolton


Hi Paul,

On Jul 8, 2011, at 1:49 PM, Paul Davis wrote:

this is why we don't care about the types of stuff that Dan Muresan
mentioned, except to the extent that it could actually lead to the
computation of data/space available being wrong in a deeper way.


You're missing the point of what Dan was pointing out (and that I
pointed out the last time we hashed through this.)  It's not the
space available computation that is at risk of failure--you're quite
right about that. The potential for failure he's referring to is that
on systems with weak memory ordering, a single-writer single-reader
lock-free ring buffer without memory barriers (like JACK's) is at
risk for the data read by the reader process becoming corrupted,
when a data pointer update becomes visible to the reading processor
before the data itself becomes visible.

On Jul 8, 2011, at 6:21 AM, Paul Davis wrote:

...but as oliver mentioned,
we've done some fairly exhaustive testing...


Really? I don't remember anyone reporting test results for an SMP
Alpha, or a Sparc v9 running in RMO mode (i.e. under linux), or a
multiprocessor (not just multicore) G5. Not to mention any of the
newer processors released in the last 2.5 years. More to-the-point,
exhaustive testing to prove the non-existence of a behavior that is
expected to be fairly uncommon to begin with, on every processor
where the code is ever likely to be run? That's a fool's errand.

Better to just follow the recommendations of the respective ABIs,
and put in the memory barriers for those platforms that need them,
like PortAudio, the linux kernel, and most other implementations
do.

Hope that's useful, it would be nice to put this subject to bed.

-Sean






___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?

2011-07-08 Thread Dan Muresan

 Better to just follow the recommendations of the respective ABIs,
 and put in the memory barriers for those platforms that need them,
 like PortAudio, the linux kernel, and most other implementations

The apps already need to do some type of synchronization internally.
For example a player's disk thread, when its ringbuffer is full, needs
to wait for the process thread to consume some data and thus free up
some space.

So I think it would be better to drop the volatile's, and leave safety
to the app level. There's no point in duplicating synchronization at
the library and at the app level.


--  Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Re: [LAD] a simple ring buffer, comments pls?