Thought a bit about it, and it would really be nice to have an aligned
uniform vector API.
ATM all are 8 byte aligned, so you probably would want also to be able
to have at least 16 and 32 byte alignment (intel's AVX has 256bit
registers that better work aligned).
But even 64 and and more could be useful for cache line alignment,
although that would require this to be a separate alignment, because the
benefits of cache line alignment are kind of defeated if the header is
in a different cache line.
So I guess just one alignment, namely that of the first element is
feasible without wasting whole cache lines. If you really need that you
can still use the take_*vector functions, and it's pretty rare to do
such things anyway. But being able to control the alignment of the first
element allows you to properly use simd instructions on those vectors.
You don't even really need any more space to store alignment
information, since that can be directly inferred from the bytevector
content pointer, although the bytevector flags still have more than
enough space to store it.
Extending the programming api to support this is a bit more tricky. I
guess most straightforward and backward compatible would be to just at a
set of make-aligned-*vector and aligned-*vector and *->aligned-*vector
functions and their scm_* versions with an additional alignment
parameter. Optional alignment parameters on the old functions could be
nice too, but I guess that is just asking for compatibility trouble.
The other question is the read syntax (one of the primary reasons I'm
doing all this). If alignment is something that should be preserved in
the permanent representation, you also need to store it in the flags,
since the content pointer can be aligned by coincidence. I haven't
looked at the compiling of bytevectors yet, to see if alignment can be
handled easily there.
As for the text representation, I think the simplest way is to add
another reserved character with the alignment number that works for
uniform vectors and arrays like #vu8>8(1 2 3 4 5 6) to have the first
element at 8byte alignment (right now the allocation pretty much ensures
4 byte alignment of the first element on 32 bit machines and 8 byte at
64bit machines, because gc_malloc returns 8byte aligned blocks, but the
array starts at cell word 3. Any 64 bit type vector like double and long
is already guaranteed to be misaligned on 32 bit platforms. Which would
be even more unfortunate on linux x32 abi systems that uses efficient 64
bit ints with 32 bit pointers, but cell size is determined by pointer size.
Or to construct simd 4 element arrays #2f32:2:4>16((1 2 3 4)(1 2 3 4)).
Maybe even have a default alignment of 16 when you just use > without a
number so #2f32:2:4>((1 2 3 4)(1 2 3 4)) is the same thing. Or even more
convenient #m128((1 2 3 4)(1.0 1.0 1.0 1.0) (2.0 2.0)) where you can
freely mix the underlying types and the size of the elements is inferred
by the amount of them in each group.
So if there is interest for something like this in the main guile, I
will make the patches. If not, I'll just stick to my crude hack for now
and see if I need the full shebang :).
Regards
Jan Schukat
On 06/12/2013 04:59 PM, Ludovic Courtès wrote:
severity 14599 wishlist
thanks
Hi!
Jan Schukat <[email protected]> skribis:
If you want to access native uniform vectors from c, sometimes you
really want guarantees about the alignment.
[...]
This isn't necessarily true for vectors created from pre-existing
buffers (the take_*vector functions), but there you have control over
the pointer you pass, so you can make it true if needed.
So if there is interest, maybe this could be integrated into the build
system as a configuration like this:
--- libguile/bytevectors.c 2013-04-11 02:16:30.000000000 +0200
+++ bytevectors.c 2013-06-12 14:45:16.000000000 +0200
@@ -223,10 +223,18 @@
c_len = len * (scm_i_array_element_type_sizes[element_type] / 8);
+#ifdef SCM_VECTOR_ALIGN
+ contents = scm_gc_malloc_pointerless
(SCM_BYTEVECTOR_HEADER_BYTES + c_len + SCM_VECTOR_ALIGN,
+ SCM_GC_BYTEVECTOR);
+ ret = PTR2SCM (contents);
+ contents += SCM_BYTEVECTOR_HEADER_BYTES;
+ contents += (addr + (SCM_VECTOR_ALIGN - 1)) & -SCM_VECTOR_ALIGN;
+#else
contents = scm_gc_malloc_pointerless
(SCM_BYTEVECTOR_HEADER_BYTES + c_len,
SCM_GC_BYTEVECTOR);
ret = PTR2SCM (contents);
contents += SCM_BYTEVECTOR_HEADER_BYTES;
+#endif
SCM_BYTEVECTOR_SET_LENGTH (ret, c_len);
SCM_BYTEVECTOR_SET_CONTENTS (ret, contents);
I don’t think it should be a compile-time option, because it would be
inflexible and inconvenient.
Instead, I would suggest using the scm_take_ functions if allocating
from C, as you noted.
In Scheme, I came up with the following hack:
--8<---------------cut here---------------start------------->8---
(use-modules (system foreign)
(rnrs bytevectors)
(ice-9 match))
(define (memalign len alignment)
(let* ((b (make-bytevector (+ len alignment)))
(p (bytevector->pointer b))
(a (pointer-address p)))
(match (modulo a alignment)
(0 b)
(padding
(let ((p (make-pointer (+ a (- alignment padding)))))
;; XXX: Keep a weak reference to B or it can be collected
;; behind our back.
(pointer->bytevector p len))))))
--8<---------------cut here---------------end--------------->8---
Not particularly elegant, but it does the job. ;-)
Do you think there’s additional support that should be provided?
Thanks,
Ludo’.