Hi,

(I did try to post something about this a couple of months ago without
realising this is a subscribers only list.)

Firstly, in addition to compression and cryprography, SIMD is
incredibly important for multimedia programming (both audio/video
codecs and augmented reality). Whilst I wouldn't expect anyone
actively working on BitC to be actively working on SIMD yet, I've been
worrying a bit about whether the low level foundations being
constructed will cause problems for SIMD implementation later.

As general comments, whilst it would be wonderful if completly
platform independent SIMD were possible, it's a really, really
difficult task because

(1) the SIMD instruction sets vary so much between processors (many
ARM NEON instructions take very different approaches to things than
Intel SSE instructions).

(2) often the instructions (particular integer instructions) have
effects which it can be very difficult to specify "are acceptable
here". it's difficult to specify "absolute difference between two
8-bit integers with saturation on overflow" (which is a typical SIMD
instruction) is acceptable (so that an instrcuction selector can
choose it) short of using something like an intrinsic.

So my personal opinion is that the language must have exact intrinsics
available in order to be able to use the full potential of the
instructions. (As full disclosure, I have a partly interest in that
I'm currently working on a "semi-platform independent SIMD macro
processor". The idea is that this absolutely isn't a compiler -- it
doesn't pretend to have a full semantic model of what's going on --
but that it allows you to construct mappings from a platform
independent form to how you want it implemented on a given platform
and then this mechanical transformation gets applied automatically as
you move code around as you code.) The key points that are, IMO,
important are:

(1) simple ways to specify that a given "array of values" is actually
a non-boxed contiguous array with specified alignment

(2) SIMD instructions really throw up nasty cases for typing and type
inference, particularly with integer types. For instance, in principle
an "absolute value of difference" ought to only produce an unsigned
result, but with the only supported SIMD multiplication instructions
being for signed operands you often need to (preferably explicitly in
the source code) CONCEPTUALLY type-convert an unsigned to a signed
value (hopefully after the programmer has proved the range of the abs
is small enough). Then you get the "bitmasks" that are computed and
used in place of conditionals. (My limited understanding of compilers
is that most of this is just ignored for scalar code, but made almost
unobservable by most languages having an arithmetic model which widens
every type to an "int" or "unsigned int" before any computation.)
Inicidentally, is there a concise write up of the current type system
and inference rules anywhere? (I haven't looked for a while.)

(2a) It would also be HIGHLY desirable to avoid what I consider the
"aliasing rules" SIMD mistake of recent C standards. In programs, for
non-performance critical routines it's preferable to allocate
variables and write code using the scalar mindset, eg, using C++
notation

void f(uint16_t *a,int l)
{
    for(int i=0;i<l;++i){
        a[i]=(uint16_t)i;
    }
}

and then it would be nice to be able to explicitly view the array as a
SIMD array, eg,

void g(uint16x8_t *b,uint16_t *a,int l)
{
    uint16x8_t *a1=viewAs<uint16x8_t*>(a);
    for(int i=0;i<l/8;++i){
        b[i]=abddiff_sat(b[i],a1[i]);
    }
}

and have this do what one would expect for
g(viewAs<uint16x8_t*>(a),a,l). But the current C/C++ aliasing rules
say that the compiler is justified in assuming b can't be the same as
a based on the difference between uint16_t and uint16x8_t types.
Numeric scalar and SIMD types are, IMO, different from the general
rule that different types can't alias.

(2b) Really obvious: avoid the Intel mistake of making every integer
SIMD type be "__m128i" rather than, eg, uint16x8_t style used by ARM.
The Intel approach both stops resolving overloading based on type, and
means the debugger doesn't know how to display variables.

(3) As Ben Kloosterman points out, aiming to use SIMD instructions
requires the compiler to be able to full optimise away any extra work
implicit in the funciton calling convention. Assuming that support for
avoid redundant reboxing/unboxing is working in general I don't
imagine that'll be a problem.

As I said at the outset, I don't expect anything to be done about
actually implementing any of this now, but it'd be reassuring to check
that the foundations being laid aren't precluding this being
implemented efficiently in the future.

Regards,
David Steven Tweed
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to