OK, i'm really impressed with the improvements in vectorization for gcc 4.3. It really seems like it's able to work with real loops which wasn't the case with 4.1. I think Chuck's right that we should simply special case contiguous data and allow the auto-vectorizer to do the rest. Something like this for the ufuncs:
/**begin repeat #TYPE=(BOOL, BYTE,UBYTE,SHORT,USHORT,INT,UINT,LONG,ULONG,LONGLONG,ULONGLONG,FLOAT,DOUBLE,LONGDOUBLE)*2# #OP=||, +*13, ^, -*13# #kind=add*14, subtract*14# #typ=(Bool, byte, ubyte, short, ushort, int, uint, long, ulong, longlong, ulonglong, float, double, longdouble)*2# */ static void @[EMAIL PROTECTED]@[EMAIL PROTECTED](@typ@ *i1, @typ@ *i2, @type@ *op, int n) { int i; for (i=0; i<n; i++) { op[i] = i1[i] @OP@ i2[i]; } } static void @[EMAIL PROTECTED]@kind@(char **args, intp *dimensions, intp *steps, void *func) { register intp i; intp is1=steps[0],is2=steps[1],os=steps[2], n=dimensions[0]; char *i1=args[0], *i2=args[1], *op=args[2]; if (is1==1 && is2==1 && os==1) return @[EMAIL PROTECTED]@[EMAIL PROTECTED]((@typ@ *)i1, (@typ@ *)i2, (@typ@ *)os, n); for(i=0; i<n; i++, i1+=is1, i2+=is2, op+=os) { *((@typ@ *)op)=*((@typ@ *)i1) @OP@ *((@typ@ *)i2); } } /**end repeat**/ We also need to add -ftree-vectorize to the standard compile flags at least for the ufuncs. James _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion