Jean-Francois Moulin wrote: > Hi, > > I am working on a script which reads rather large amounts of data in a > binary format and then > processes it through different test functions. > I optimized the beast as much as I possibly could: using tuples > instead of lists, > then moving to cython and declaring the types, optimizing the calls to numpy > fn > by use of the buffer notation... > > All in all I gain a factor 10 in speed. Not bad but still not really enough... > > What I still see as factors slowing me down could be (see my code in attach): > - the use of the file.read() function from python to get a string > which I then process (is an fread call > from c faster... how to implement it?)
The real problem is that you read 4 bytes at the time. If you buffer up longer stretches somehow it doesn't matter so much which call you use. I.e.: obj = file.read(400) cdef char* buf = obj # hold on to obj, but process buf[0]..buf[399] buf = NULL obj = None # do not do this until you no longer use buf Though if you have a socket rather than a file I suppose you're worse off. You can use C file handling diretly (the safest thing is to open and close the file/socket with C calls as well), just look up Cython examples on interfacing with C code and Google for C and file handling. > - the use of the struct.unpack As long as you stick to native-endian, you should be able to just cast to an int in your case: cdef char* buf = data cdef int* buf_as_int = <int*>buf cdef int value = *buf_as_int If you need to access more than one int, you can use a struct instead. > - the bit masking technique I use... (is it good or bad) For speed it is very fast -- if it has the effect you want there's not going to be any faster way. Consider writing it like this though: bit30 = data & (1 << 30) != 0 But it is just about readability. (It will be compiled to the same thing.) -- Dag Sverre _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
