Hi Brett,

There are a few small ways to speed up the code (I'm not thinking
about algorithmic ones; there may be some of those too)

> @cython.boundscheck(False) # turn of bounds-checking for entire function
> def table_bincount(int max_elements, np.ndarray[long, ndim=2] table):
>    cdef long rows = table.shape[0]
>    cdef long columns = table.shape[1]
>    cdef long i, j

Use size_t (in latest cython version) or unsigned long instead of long
where appropriate.  If it's a signed type, cython checks to see if
it's negative so things like a[-1] can be supported.  Thus unsigned
types are faster.

>    cdef np.ndarray[double, ndim=2] retval = np.zeros(
>            (rows, max_elements), dtype=np.float64)

you can add mode="c" into this array as well, since you're creating
it.  This means it's a contiguously aligned array, so the last stride
is assumed to be one.  This can speed up internal loops.  So the new
version is

>    cdef np.ndarray[double, ndim=2, mode="c"] retval = np.zeros(
>            (rows, max_elements), dtype=np.float64)

>    for 0 <= i < rows:
>        # Is there any way of typing these intermediate values so that they
>        # do not get turned into python code?
>        current_in_row = table[i]
>        current_out_row = retval[i]

Any reason you can't use table[i, j] and retval[i,j] instead of
creating intermediate slices?  Note that you should usually do A[i,j]
instead of A[i][j] in regular python as well.

>            current_out_row[current_in_row[j]] += 1

Can't this just be something like:

         retval[i, table[i,j]] += 1

Hope this all helps!

--Hoyt

++++++++++++++++++++++++++++++++++++++++++++++++
+ Hoyt Koepke
+ University of Washington Department of Statistics
+ http://www.stat.washington.edu/~hoytak/
+ [email protected]
++++++++++++++++++++++++++++++++++++++++++
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to