On Tue, 19 Jul 2011 17:49:14 +0200, Carlos Becker wrote: > I made more tests with the same operation, restricting Matlab to use a > single processing unit. I got: > > - Matlab: 0.0063 sec avg > - Numpy: 0.026 sec avg > - Numpy with weave.blitz: 0.0041
To check if it's an issue with building without optimizations, look at the build log: C compiler: gcc -pthread -fno-strict-aliasing "-ggdb" -fPIC ... gcc: build/src.linux-x86_64-2.7/numpy/core/src/umath/umathmodule.c I.e., look on the "C compiler:" line nearest to the "umathmodule" compilation. Above is an example with no optimization. *** For me, compared to zeroing the memory via memset & plain C implementation (Numpy 1.6.0 / gcc): Blitz: 0.00746664 Numpy: 0.00711051 Zeroing (memset): 0.00263333 Operation in C: 0.00706667 with "gcc -O3 -ffast-math -march=native -mfpmath=sse" optimizations for the C code (involving SSE2 vectorization and whatnot, looking at the assembler output). Numpy is already going essentially at the maximum speed. ----------------- #include <stdlib.h> #include <time.h> #include <stdio.h> #include <string.h> int main() { double *a, *b; int N = 2000*2000, M=300; int j; int k; clock_t start, end; a = (double*)malloc(sizeof(double)*N); b = (double*)malloc(sizeof(double)*N); start = clock(); for (k = 0; k < M; ++k) { memset(a, '\0', sizeof(double)*N); } end = clock(); printf("Zeroing (memset): %g\n", ((double)(end-start))/CLOCKS_PER_SEC/M); start = clock(); for (k = 0; k < M; ++k) { for (j = 0; j < N; ++j) { b[j] = a[j] - 0.5; } } end = clock(); printf("Operation in C: %g\n", ((double)(end-start))/CLOCKS_PER_SEC/M); return 0; } _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion