I found that reducing memory allocation in the loop does not do much in terms of speed. E.g. when doing something like
xx = xprime[:] the timing difference between sql(xprime[:]) and sql(xx) is only about 5%. so my guess is most of the time is just spent inside the sql() function call and this had to be optimized in order to speed things up. (no i did not do xx = xprime[:] inside the loop...) Is the sql() really faster than the matlab version of it? cheers