Dag Sverre Seljebotn wrote: > Jean-Francois Moulin wrote: >> Hi, >> >> I am working on a script which reads rather large amounts of data in a >> binary format and then >> processes it through different test functions. >> I optimized the beast as much as I possibly could: using tuples >> instead of lists, >> then moving to cython and declaring the types, optimizing the calls to numpy >> fn >> by use of the buffer notation... >> >> All in all I gain a factor 10 in speed. Not bad but still not really >> enough... >> >> What I still see as factors slowing me down could be (see my code in attach): >> - the use of the file.read() function from python to get a string >> which I then process (is an fread call >> from c faster... how to implement it?) > > The real problem is that you read 4 bytes at the time. If you buffer up > longer stretches somehow it doesn't matter so much which call you use. I.e.: > > obj = file.read(400) > cdef char* buf = obj > # hold on to obj, but process buf[0]..buf[399] > buf = NULL > obj = None # do not do this until you no longer use buf > > Though if you have a socket rather than a file I suppose you're worse off. > > You can use C file handling diretly (the safest thing is to open and > close the file/socket with C calls as well), just look up Cython > examples on interfacing with C code and Google for C and file handling. > >> - the use of the struct.unpack > > As long as you stick to native-endian, you should be able to just cast > to an int in your case: > > cdef char* buf = data > cdef int* buf_as_int = <int*>buf > cdef int value = *buf_as_int
Argh, this is not Cython :-) (and the irony is we're just having a discussion about this on the list) Do cdef int value = buf_as_int[0] instead. Or just value = (<int*>buf)[0] -- Dag Sverre _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
