On Jan 7, 2010, at 2:04 PM, Matthew wrote: > OK I'm going to give the __new__ hack a try from > http://trac.cython.org/cython_trac/ticket/238 > I don't really need to overload __new__ do I, so I don't have to > change matrix.pxd?
Sorry, this is the ticket number that I meant to refer to: http://trac.cython.org/cython_trac/ticket/443 , though this takes no arguments, so may not apply to you. > The vsipl vendor tells me that the only real expensive operation is > the cblockbind() within __cinit__(). However the __dealloc()__ > routine > is very expensive as well (Given the number of times it's being > called). It would be nice if I could profile on a line by > line basis. I'm not sure if the python cProfile tool supports this > or not. It does, but we don't have that implemented in Cython yet. Given that it's a deterministic rather than (external) probabilistic profiler, the profiling itself may significantly impact the speed and results. Try commenting stuff out, or factoring it into an (inline) function. > I can't just chalk up this result to the vsipl code, since the hash > routine is not giving me any performance gain and seemed to be > making things worse. (Though I probably need to do some more > debugging to see if I have a lot of cache misses,or some bug in my > logic.) Wel, maybe hashing is slightly more expensive than the vsipl call. On a completely unrelated note, getting data to/from a GPU can be a bottleneck as well, and due to its asynchronous nature may not show up as obviously in the main CPU profiling results. > For the life of me I could not figure out how to just put the matrix > object itself into my hash indexed memory cache. It seemed like my > python objects were always being garbage collected once I hit the > __dealloc__ routine (the self.arr ndarray as an example). Later I > found out that the cython class get's stripped of it's attributes if > it's stored in a dictionary. Only those attributes written to the > classes internal dictionary in the __init__() method seem to get > saved, as far as I can tell from my experiments. Of course I'd > actually like to avoid calling __init__(). I really didn't intend > to try to learn the internals of python or cython for that matter, > but I do need to figure > out how to optimize this code. I think you're trying to make things way more complicated than necessary. The easiest approach is to only expose wrapper classes, and cache the expensive initalization in an internal class. See http://sage.math.washington.edu/home/robertwb/cython/mat.html http://sage.math.washington.edu/home/robertwb/cython/mat.pyx (I'm sure there's some more room for optimization, and the caching algorithm could be improved as well.) Also, note that creating the numpy arrays is expensive as well. In [1]: from mat import * In [2]: %time make_np(10**5) CPU times: user 0.56 s, sys: 0.43 s, total: 0.99 s Wall time: 0.99 s In [4]: %time make_CMat(10**5) CPU times: user 0.68 s, sys: 0.45 s, total: 1.13 s Wall time: 1.14 s In [6]: %time make_CachedCMat(10**5) CPU times: user 0.14 s, sys: 0.00 s, total: 0.14 s Wall time: 0.14 s In [8]: %time make_Empty(10**5) CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s Wall time: 0.02 s - Robert _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
