OK, i'm really impressed with the improvements in vectorization for
gcc 4.3. It really seems like it's able to work with real loops which
wasn't the case with 4.1. I think Chuck's right that we should simply
special case contiguous data and allow the auto-vectorizer to do the
rest. Something like this for the ufuncs:

 /**begin repeat

   #TYPE=(BOOL,
BYTE,UBYTE,SHORT,USHORT,INT,UINT,LONG,ULONG,LONGLONG,ULONGLONG,FLOAT,DOUBLE,LONGDOUBLE)*2#
   #OP=||, +*13, ^, -*13#
   #kind=add*14, subtract*14#
   #typ=(Bool, byte, ubyte, short, ushort, int, uint, long, ulong,
longlong, ulonglong, float, double, longdouble)*2#
*/

static void
@[EMAIL PROTECTED]@[EMAIL PROTECTED](@typ@ *i1, @typ@ *i2, @type@ *op, int n)
{
   int i;
   for (i=0; i<n; i++) {
      op[i] = i1[i] @OP@ i2[i];
   }
}

static void
@[EMAIL PROTECTED]@kind@(char **args, intp *dimensions, intp *steps, void *func)
{
    register intp i;
    intp is1=steps[0],is2=steps[1],os=steps[2], n=dimensions[0];
    char *i1=args[0], *i2=args[1], *op=args[2];

    if (is1==1 && is2==1 && os==1)
        return @[EMAIL PROTECTED]@[EMAIL PROTECTED]((@typ@ *)i1, (@typ@ *)i2, 
(@typ@ *)os, n);

    for(i=0; i<n; i++, i1+=is1, i2+=is2, op+=os) {
        *((@typ@ *)op)=*((@typ@ *)i1) @OP@ *((@typ@ *)i2);
    }
}
/**end repeat**/

We also need to add -ftree-vectorize to the standard compile flags at
least for the ufuncs.

James
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to