On Fri, 26 Jan 2007, Stefan Westerfeld wrote:

> On Thu, Jan 25, 2007 at 10:23:44AM +0100, Tim Janik wrote:
>> On Thu, 25 Jan 2007, Stefan Westerfeld wrote:

>>> |         for (int i = 0; i < space; i++)
>>> |           m_buffer[wm + i] = data[i];
>>>
>>> Its better to use std::copy, because for some data types (primitive
>>> types), then memmove will be used (template specialization).
>>
>> the way the code is written above, GCC can auto-vectorize it,
>> is that true for std::copy as well?
>
> I think the only way to find out what is faster is a benchmark, and even
> then, the benchmark may result in different results, depending on
[...]

> What you see is that on my system, in every case std::copy was faster
> than a hand written loop. Vectorization does make a difference, but
> still the vectorized loop is slower than std::copy in every case.

interesting, the other question that came to my mind about std::copy
is how it correctly handles memcpy vs. copy constructors.
stl_algobase.h tells us that memmove is used if the copy argument
types are scalars and equal. cpp_type_traits.h implements __is_scalar
by matching bool, char, wchar_t, short, int, long, long long, float,
double, pointer. otherwise, an assignment loop similar to the one we
already have is used.
that should eliminate the need for vectorization in most cases.

>>> |         wm = (wm + space) % m_size;
>>> |         data += space;
>>> |         length -= space;
>>> |       }
>>>
>>> Systems with memory barriers need a write memory barrier here. In the
>>> jack driver code this looks like:
>>
>> write/read memory barriers are implemented as part of Atomic::uint_get
>> and Atomic::uint_set. also, volatile variables haven't been updated
>> here, so there's no need for a barrier at this point.
>
> Are you sure? Suppose Atomic::uint_set looked like this:

i thought you meant the updating of wm here which is a local variable
and didn't quite make sense to me. reading up in:

> http://techweb.rfa.org/pipermail/portaudio/2006-May/005651.html

i get your point though. i've added descriptive functions to birnet
for the barriers now, using them once it's pushed to birnet.git and
merged into beast should make the jack driver more readable as well:

diff --git a/birnet/birnetthread.hh b/birnet/birnetthread.hh
index 48b9cee..061cefd 100644
--- a/birnet/birnetthread.hh
+++ b/birnet/birnetthread.hh
@@ -62,6 +62,8 @@ public:
  };

  namespace Atomic {
+inline void    read_barrier  (void)                                { int 
dummy; ThreadTable.atomic_int_get (&dummy); }
+inline void    write_barrier (void)                                { int 
dummy; ThreadTable.atomic_int_set (&dummy, 0); }
  /* atomic integers */
  inline void    int_set       (volatile int  *iptr, int value)      { 
ThreadTable.atomic_int_set (iptr, value); }
  inline int     int_get       (volatile int  *iptr)                 { return 
ThreadTable.atomic_int_get (iptr); }

and yes, this is a sufficient barrier implementation, the
functions boil down to g_atomic_int_get/g_atomic_int_set which are
special cased on systems that need barriers to implement a memory
barrier. glib doesn't distinguish between read/write barriers, but
making that distinction in the API should help readability in the
actual code.

thanks again, i'll update the code accordingly.

>   Cu... Stefan

---
ciaoTJ
_______________________________________________
beast mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/beast

Reply via email to