On Fri, 26 Jan 2007, Stefan Westerfeld wrote: > On Thu, Jan 25, 2007 at 10:23:44AM +0100, Tim Janik wrote: >> On Thu, 25 Jan 2007, Stefan Westerfeld wrote:
>>> | for (int i = 0; i < space; i++) >>> | m_buffer[wm + i] = data[i]; >>> >>> Its better to use std::copy, because for some data types (primitive >>> types), then memmove will be used (template specialization). >> >> the way the code is written above, GCC can auto-vectorize it, >> is that true for std::copy as well? > > I think the only way to find out what is faster is a benchmark, and even > then, the benchmark may result in different results, depending on [...] > What you see is that on my system, in every case std::copy was faster > than a hand written loop. Vectorization does make a difference, but > still the vectorized loop is slower than std::copy in every case. interesting, the other question that came to my mind about std::copy is how it correctly handles memcpy vs. copy constructors. stl_algobase.h tells us that memmove is used if the copy argument types are scalars and equal. cpp_type_traits.h implements __is_scalar by matching bool, char, wchar_t, short, int, long, long long, float, double, pointer. otherwise, an assignment loop similar to the one we already have is used. that should eliminate the need for vectorization in most cases. >>> | wm = (wm + space) % m_size; >>> | data += space; >>> | length -= space; >>> | } >>> >>> Systems with memory barriers need a write memory barrier here. In the >>> jack driver code this looks like: >> >> write/read memory barriers are implemented as part of Atomic::uint_get >> and Atomic::uint_set. also, volatile variables haven't been updated >> here, so there's no need for a barrier at this point. > > Are you sure? Suppose Atomic::uint_set looked like this: i thought you meant the updating of wm here which is a local variable and didn't quite make sense to me. reading up in: > http://techweb.rfa.org/pipermail/portaudio/2006-May/005651.html i get your point though. i've added descriptive functions to birnet for the barriers now, using them once it's pushed to birnet.git and merged into beast should make the jack driver more readable as well: diff --git a/birnet/birnetthread.hh b/birnet/birnetthread.hh index 48b9cee..061cefd 100644 --- a/birnet/birnetthread.hh +++ b/birnet/birnetthread.hh @@ -62,6 +62,8 @@ public: }; namespace Atomic { +inline void read_barrier (void) { int dummy; ThreadTable.atomic_int_get (&dummy); } +inline void write_barrier (void) { int dummy; ThreadTable.atomic_int_set (&dummy, 0); } /* atomic integers */ inline void int_set (volatile int *iptr, int value) { ThreadTable.atomic_int_set (iptr, value); } inline int int_get (volatile int *iptr) { return ThreadTable.atomic_int_get (iptr); } and yes, this is a sufficient barrier implementation, the functions boil down to g_atomic_int_get/g_atomic_int_set which are special cased on systems that need barriers to implement a memory barrier. glib doesn't distinguish between read/write barriers, but making that distinction in the API should help readability in the actual code. thanks again, i'll update the code accordingly. > Cu... Stefan --- ciaoTJ _______________________________________________ beast mailing list [email protected] http://mail.gnome.org/mailman/listinfo/beast
