The end of the numpy section of the cython docs suggests the possibility to call
numpy/scipy functions without Python call overhead. How can this be done?

A test script at bottom is 1.8 times faster when I expand numpy calls into
simple for loops (n,m = 1000,1500). weave.inline is 2.7 times faster. Looking at
the cython -a output, not sure where most of that time is lost. Looks like
strides generate many more calls and dot products are done using Python calls
for multiplications, for example. 

Replacing numpy calls with explicit code even for simple things as matrix
multiplication will end up being bad as the code wont take advantage of
optimizations like blas on multicore. Sorry if this has been discussed already.


from numpy import (random, diag, dot, empty, sign, abs)
from numpy cimport ndarray
cimport cython

@cython.boundscheck(False)
def f(ndarray[double, ndim=1] y, ndarray[double, ndim=2] X, double lam,
      int maxit=10000, double eta=1e-9):  

    cdef int n = X.shape[0], m = X.shape[1]
    cdef ndarray[double, ndim=1] theta = random.normal(0, 1, m)
    cdef ndarray[double, ndim=1] v = diag(dot(X.T, X))
    
    cdef unsigned int i = 0, j = 0
    cdef double alpha
    cdef double chg = 1.
    cdef ndarray[double, ndim=1] theta_old = empty((m,))
    cdef ndarray[double, ndim=1] Xtheta = dot(X, theta)
    
    while i < maxit and chg > eta:
        i += 1
        theta_old = theta.copy()
        for j in range(m):    
            if theta[j] != 0.: Xtheta -= theta[j] * X[:,j]
            alpha = dot(X[:,j].T, y - Xtheta)
            if abs(alpha) > lam: 
                theta[j] = sign(alpha) * (abs(alpha) - lam) / v[j]
                Xtheta += theta[j] * X[:,j]
            else:
                theta[j] = 0.
        chg = sum((theta - theta_old)**2) / sum(theta_old**2)
         
    return theta


_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to